What is _GetExceptDLLinfo? - c++

I'm working on a Java project that calls a native Windows executable with a Java Process object. Sometimes I see an exception in the native code and the symbol _GetExceptDLLinfo appears in the native stack trace. Is there some meaningful exception to capture and if so, how do I capture it?

_GetExceptDLLinfo apparently can show up when the debugger fails to find the correct function name for an address. It will likely be followed by +0xCRAZYBIG. Normally you'd expect to see something like +0000003a which means the 59th byte of the _GetExceptDLLinfo function.

Related

LoadLibrary cannot find ntoskrnl

I am writing a small app which calls KeBugCheck and crashes the system but LoadLibrary is unable to find ntoskrnl.exe (I get 126 as return value when calling GetLastError)
Here is my code:
void* fnc;
HMODULE bcLib;
bcLib = LoadLibrary((LPCWSTR)"ntoskrnl.exe");
fnc = (void*) GetProcAddress(bcLib, (LPCSTR)"KeBugCheck");
int(*KeBugCheck)(ULONG);
KeBugCheck = (int(*)(ULONG))fnc;
KeBugCheck(0x000000E2);
Also, in the debug window, I see this error:
First-chance exception at 0x00000000 in app.exe: 0xC0000005:
Access violation executing location 0x00000000.
Any help will be very much appriciated
KeBugCheck is a kernel function. That means you can't call it from user-mode code, like the application you're trying to write.
There is also no user-mode wrapper provided for this function because user-mode code is not supposed to be able to bring down the entire system.
You will have to write your own kernel-mode driver to do this. To get started, download the Windows Driver Development Kit (DDK). And in that case, there will be no need for the whole LoadLibrary and GetProcAddress dance, since the function declaration is in the public Ntddk.h header and will be linked in automatically from the Ntoskrnl.lib file.
As for the problem you're having here, with LoadLibrary returning ERROR_MOD_NOT_FOUND, that is unrelated. The code you have is wrong, quite obvious from the explicit cast to LPCWSTR that you're having to perform in order to shut the compiler up.
You're compiling a Unicode application, so the call to LoadLibrary is automatically resolved to LoadLibraryW, which accepts a wide (Unicode) string with the type LPCWSTR. You're trying to pass it a narrow string literal, which generates a type mismatch error. Except that you've inserted the cast, which effectively tells the compiler to shut up because you know better than it. Except that you don't. You should listen to the compiler; it can save you from a lot of bugs.
The fix is simple: remove all the superfluous casts from your code and use a wide string literal instead. (The GetProcAddress function, however, is unique: it always requires a narrow string, regardless of whether or not you're compiling for Unicode.)
HMODULE bcLib = LoadLibrary(L"ntoskrnl.exe");
void* fnc = (void*)GetProcAddress(bcLib, "KeBugCheck");
Of course, once you fix this, you'll want to see the first part of my answer.
Try using the ntdll.dll NtRaiseHardError function. ntdll functions are the closest that you can get in user-mode to kernel-mode functions and NtRaiseHardError eventually calls KeBugCheck in the kernel.

How to debug "This application has requested the Runtime to terminate it in an unusual way." when I can't even step in the code?

I have a C++ program that is giving this error as soon as the process starts - apparently before any user code executes. It only happens when inlining is enabled. Even with debug symbols built in, I can't step in the code. As soon as I press F10 in Visual Studio I get the error and the program stops. I checked all exceptions/checks in "Debug/Exceptions" but still don't get a break.
Normally I would expect something like this to be due to a missing runtime dependency but I'm quite positive that's not the case here (verified with Dependency Walker).
edit: I used Steve Townsend's recommendation of CDB and now I'm able to step through the pre-user-code parts of the program. The final stack trace is:
Child-SP RetAddr Call Site
00000000`0008e308 00000000`7541601a ntdll!ZwTerminateProcess+0xa
00000000`0008e310 00000000`7540cf87 wow64!Wow64EmulateAtlThunk+0x86ba
00000000`0008e340 00000000`7539276d wow64!Wow64SystemServiceEx+0xd7
00000000`0008ec00 00000000`7540d07e wow64cpu!TurboDispatchJumpAddressEnd+0x24
00000000`0008ecc0 00000000`7540c549 wow64!Wow64SystemServiceEx+0x1ce
00000000`0008ed10 00000000`7776ae27 wow64!Wow64LdrpInitialize+0x429
00000000`0008f260 00000000`777672f8 ntdll!LdrGetKnownDllSectionHandle+0x1a7
00000000`0008f760 00000000`77752ace ntdll!RtlInitCodePageTable+0xe8
00000000`0008f7d0 00000000`00000000 ntdll!LdrInitializeThunk+0xe
You could try setting up Process Dumper and configure it for your EXE to create a dump on any process exit. Then start the process from the command line to rule out any artifacts of the IDE.
This ought to give you a dump for post-mortem debugging, and maybe a callstack fromm the exiting thread that could be useful.
It probably has to do with the order that your globals are being initialized. In C++, the order between modules is unspecified. So if a global's initializer depends on a global in another module already being initialized, you're in trouble.
It's possible to put a break point in the CRT initialization code that runs before calling main (or wmain, or WinMain, or whatever you're using). You can step through that code and see what's causing the problem.
Another possible cause is a DllMain function is returning an error or throwing an exception during DLL_PROCESS_ATTACH.

JNI C++ Debugging Techniques?

I have a Linux C++ application that creates a JVM and makes JNI calls. I am new to JNI, and so far I the only effective way I have found to debug my application during development is by trial and error. What are some techniques to use to debug the infamous "A fatal error has been detected by the Java Runtime Environment" Java VM crashes? How do I know if the problem is my code or a genuine JVM bug?
In general, the obvious thing I know so far are:
In the code, always check jobject, class, and jmethodID values returned from JNI calls for NULL values before proceeding further.
Call env->ExceptionCheck() where appropriate to ensure there are no pending exceptions.
Currently, I'm stuck on an issue where the stack trace in the error report file is less than helpful:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00002b137a99db59, pid=19977, tid=47362673452544
#
# JRE version: 6.0_20-b02
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
# Problematic frame:
# V [libjvm.so+0x40fb59]
... <snip> ...
Stack: [0x00007fff1964f000,0x00007fff1974f000], sp=0x00007fff1974e050, free space=3fc0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x40fb59]
V [libjvm.so+0x3ecbe1]
C [libDataFabric.so+0x1bb5b] _Jv_JNIEnv::CallObjectMethod(__jobject*, _jmethodID*, ...)+0xe3
etc. ...
Ok, so I know that it's dying in env->CallObjectMethod(). I checked all the parameters to that in GDB before it dives into the JVM code, but I don't see any obvious NULL or strange values. And of course, all the JNI classes, such as jobject, are unhelpfully opaque, so I can't see if their pointers are pointing to bogus or real data.
Any tips/suggestions/ideas for this kind of problem?
Ok, so here's how I approached the problem I mentioned above. Somewhat tedious, but, given enough time and effort, it eventually paid off.
Don't assume that env->CallMethod(jobj, meth_id, ...) is being passed correct values. If this is where it is crashing, chances are high that some hard-to-find but fundamental issue is at fault, such as the methodId being passed does not match the jobject being passed to CallObjectMethod(...). I wrote a simple helper method std::string getClassInfo(JNIEnv* env, jclass aJavaClass) that gets the MethodID for "toString" on a class, calls that method, and returns the result as a std::string. That told me weather an object was what I thought it was or not.
Liberally sprinkle debug output statements between your JNI calls. Especially outputting class names (such as via the above method) will help you figure out weather objects are what you think they are.
Make sure you're checking for null methodIDs and calling env->ExceptionCheck() after each CallMethod(...). Checking for null after CallMethod(...) won't help, because the JNI can't know if null is a valid return type.
Don't assume that the JNI will crash at the first sign of trouble. I was actually passing the wrong object type through several JNI calls before it actually crashed. See #3 to make sure you catch the issue early.
Be aware that on Linux, the JVM itself uses SEGV signals to indicate that the garbage collector should run. I use "handle SIGSEGV pass noprint nostop" in gdb to let the JVM deal with those things.

Tracking down the source code line of a crash from a non-debug built module

I have a widows crash-dump with a call stack showing me the module!functionname+offset of the function that caused the crash. The module is built without debug information using gcc.
The cause of the crash is an exception caused by a failed to write at a given address, i.e access violation(05), write violation(01)
On my development machine I have access to the same module built with debugging information. What I'm looking for is a way to track down the corresponding source code line that caused the crash, this by using the module!functionname+offset information as starting point.
The method name of the top frame in the call stack is a class destructor
The mangled function name is _ZN20ViewErrorDescriptionD0Ev+x79
Running objdump -d searching for the module!functionname+offset gives:
.... call *%eax
.... mov 0xffffffbc(%ebp), %eax
.... cmpl 0x0, 0x148(%eax)
trying to find this in the debug built file gives no match
The source code of the destructor only contains two delete pointerX calls.
Using gdb to load the debug built module(sharedlibrary) and then calling info line gives me a starting and ending address, using grep on the objdump output shows the corresponding disassembled code, which looks quite much like the one from the module without debug info, but still far from the same.
!NB - The output from info line says _ZN20ViewErrorDescriptionD2Ev not _ZN20ViewErrorDescriptionD0Ev as the crash dump says.
Taken from the ABI documentation:
::= D1 # complete object destructor
::= D2 # base object destructor
Where do I go from here?
Best regards
Kristofer H
Unfortunately even debug/non-debug builds may have different address layouts. The only way I'm aware of to accomplish something like this is to build with debug symbols and save off a copy of that binary. Then you can deploy a stripped version without the debug information.
Your approach attempting to locate the assembly code seems the most hopeful here. I would expand that even though: Try to look at a much larger chunk of assembly in the crashed file and see if you can generate more context yourself rather than having the computer attempt to match low-level instructions that might in fact slightly differ.
This works on the assumption that gcc compilation is 100% deterministic. I'm not sure how valid that assumption is. However, taking the further assumption that you still have exactly the same source code you could try enabling the gcc's -S command line option and rebuilding. This will result in a set of .s files, one for each source file, containing the assembly code. You can then search through this for the code machine code that you want to find.

How to debug a segmentation fault while the gdb stack trace is full of '??'?

My executable contains symbol table. But it seems that the stack trace is overwrited.
How to get more information out of that core please? For instance, is there a way to inspect the heap ? See the objects instances populating the heap to get some clues. Whatever, any idea is appreciated.
I am a C++ programmer for a living and I have encountered this issue more times than i like to admit. Your application is smashing HUGE part of the stack. Chances are the function that is corrupting the stack is also crashing on return. The reason why is because the return address has been overwritten, and this is why GDB's stack trace is messed up.
This is how I debug this issue:
1)Step though the application until it crashes. (Look for a function that is crashing on return).
2)Once you have identified the function, declare a variable at the VERY FIRST LINE of the function:
int canary=0;
(The reason why it must be the first line is that this value must be at the very top of the stack. This "canary" will be overwritten before the function's return address.)
3) Put a variable watch on canary, step though the function and when canary!=0, then you have found your buffer overflow! Another possibility it to put a variable breakpoint for when canary!=0 and just run the program normally, this is a little easier but not all IDE's support variable breakpoints.
EDIT: I have talked to a senior programmer at my office and in order to understand the core dump you need to resolve the memory addresses it has. One way to figure out these addresses is to look at the MAP file for the binary, which is human readable. Here is an example of generating a MAP file using gcc:
gcc -o foo -Wl,-Map,foo.map foo.c
This is a piece of the puzzle, but it will still be very difficult to obtain the address of function that is crashing. If you are running this application on a modern platform then ASLR will probably make the addresses in the core dump useless. Some implementation of ASLR will randomize the function addresses of your binary which makes the core dump absolutely worthless.
You have to use some debugger to detect, valgrind is ok
while you are compiling your code make sure you add -Wall option, it makes compiler will tell you if there are some mistakes or not (make sure you done have any warning in your code).
ex: gcc -Wall -g -c -o oke.o oke.c
3. Make sure you also have -g option to produce debugging information. You can call debugging information using some macros. The following macros are very useful for me:
__LINE__ : tells you the line
__FILE__ : tells you the source file
__func__ : tells yout the function
Using the debugger is not enough I think, you should get used to to maximize compiler ablity.
Hope this would help
TL;DR: extremely large local variable declarations in functions are allocated on the stack, which, on certain platform and compiler combinations, can overrun and corrupt the stack.
Just to add another potential cause to this issue. I was recently debugging a very similar issue. Running gdb with the application and core file would produce results such as:
Core was generated by `myExecutable myArguments'.
Program terminated with signal 6, Aborted.
#0 0x00002b075174ba45 in ?? ()
(gdb)
That was extremely unhelpful and disappointing. After hours of scouring the internet, I found a forum that talked about how the particular compiler we were using (Intel compiler) had a smaller default stack size than other compilers, and that large local variables could overrun and corrupt the stack. Looking at our code, I found the culprit:
void MyClass::MyMethod {
...
char charBuffer[MAX_BUFFER_SIZE];
...
}
Bingo! I found MAX_BUFFER_SIZE was set to 10000000, thus a 10MB local variable was being allocated on the stack! After changing the implementation to use a shared_ptr and create the buffer dynamically, suddenly the program started working perfectly.
Try running with Valgrind memory debugger.
To confirm, was your executable compiled in release mode, i.e. no debug symbols....that could explain why there's ?? Try recompiling with -g switch which 'includes debugging information and embedding it into the executable'..Other than that, I am out of ideas as to why you have '??'...
Not really. Sure you can dig around in memory and look at things. But without a stack trace you don't know how you got to where you are or what the parameter values were.
However, the very fact that your stack is corrupt tells you that you need to look for code that writes into the stack.
Overwriting a stack array. This can be done the obvious way or by calling a function or system call with bad size arguments or pointers of the wrong type.
Using a pointer or reference to a function's local stack variables after that function has returned.
Casting a pointer to a stack value to a pointer of the wrong size and using it.
If you have a Unix system, "valgrind" is a good tool for finding some of these problems.
I assume that since you say "My executable contains symbol table" that you compiled and linked with -g, and that your binary wasn't stripped.
We can just confirm this:
strings -a |grep function_name_you_know_should_exist
Also try using pstack on the core ans see if it does a better job of picking up the callstack. In that case it sounds like your gdb is out of date compared to your gcc/g++ version.
Sounds like you're not using the identical glibc version on your machine as the corefile was when it crashed on production. Get the files output by "ldd ./appname" and load them onto your machine, then tell gdb where to look;
set solib-absolute-prefix /path/to/libs