Debugging startup issues in mixed code applications - c++

We have an application that is a C/C++/MFC desktop application, with some C++/CLI assemblies that allow us to access some managed code functionality. The app is crashing at startup in release mode only with the message
An unhandled exception of type 'System.TypeInitializationException' occurred in Unknown Module.
Additional information: The type initializer for '' threw an exception.
How can I go about debugging this scenario and what are the issues in mixing managed/unmanaged code? What special steps do I have to take to make them play nicely?

Things to suspect are:
Missing unmanaged DLLs. You can use Depends (from Sysinternals) and start profiling, but I've struggled to get good results in mixed-mode.
Make a native minimal test harness which has the same dependencies, and run that through Depends - you will get definitive information about missing DLLs.
Are you using obfuscation in your release build product?
The obfuscation software we use adds a field to types in evaluation mode. We have fixed offset structs, yet the new field doesn't get an explicit offset. This is an error which would be flagged up at compile time if it was in our own code. Since the obfuscator is dynamically patching the assembly, the CLR has no option but to fail to load the invalid type at runtime.

In my opinion (based on some trouble I have had) Matt Smith's comment is the most useful answer; actually, check 'Thrown' for all types of exception. Quite often the problem is a crash in the constructor of a global object. See also answer 5 at
http://www.codeproject.com/Questions/67419/The-type-initializer-for-threw-an-exception
which says (among other things):
Click Debug--> Exceptions and check ON all the Thrown checkboxes. This
will cause the debugger to stop on all first chance exceptions and
will help you find the error under the Type Initializer error that
you're seeing. If it is related to another assembly, as mine was, you
can use Microsoft's Assembly Binding Log Viewer tool to help determine
the problem.

Related

Interpreting Access Violation Exceptions?

We have a custom application we use built around VB/C++ code.
This code will run for days, weeks, months, without this throwing up an exception error.
I'm trying to learn more about how this error is thrown, and how to interpret (if you can) the error listed when an exception is thrown. I've googled some information and read the Microsoft provided error description, but I'm still stuck with the task of trouble shooting something that occurs once in a blue moon. There is no known set of interactions with the software that causes this and appears to happen randomly.
Is the first exception the root cause? Is it all the way down the stack call? Can anyone provide any insight on how to read these codes so I could interpret where I actually need to look.
Any information or guidance on reading the exception or making any use of it, and then trouble shooting it would be helpful. The test below is copied from windows log when the event was thrown.
Thanks in advance for any help.
Application: Epic.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: System.AccessViolationException [![enter image description here][1]][1]
at MemMap.ComBuf.IsCharAvailable(Int32)
at HMI.frmPmacStat.RefreshTimer_Elapsed(System.Object, System.Timers.ElapsedEventArgs)
at System.Timers.Timer.MyTimerCallback(System.Object)
at System.Threading.TimerQueueTimer.CallCallbackInContext(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.TimerQueueTimer.CallCallback()
at System.Threading.TimerQueueTimer.Fire()
at System.Threading.TimerQueue.FireQueuedTimerCompletion(System.Object)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
There are exceptions that are thrown by the c++ runtime environment, as a result of executing a throw expression, and there are other types of errors caused by the operating system or hardware trapping your instruction. Invalid access to memory is generally not thrown by code in c++, but is a side effect of evaluating an expression trying to access memory at an invalid address, resulting in the OS signaling the process, usually killing it. Because it's outside C++, it tends to be platform specific, but typical errors are:
reading a null pointer
using a pointer to an object that has been deleted
going outside an array's valid range of elements
using invalidated iterators into STL containers
Generally speaking, you can test for null and array bounds at runtime to detect the problem before it happens. Using a dangling pointer is more difficult to track down, because the time between the delete and the mis-use of that pointer can be long, and it can be difficult to find why it happened without a memory debugger, such as valgrind. Using smart pointers instead of raw pointers can help mitigate the problems of mis-managing memory, and can help avoid such errors.
Invalid iterators are subset of the general dangling pointer problem, but are common enough to be worth mentioning as their own category. Understanding your containers and which operations invalidate them is crucial, and some implementations can be compiled in "debug mode" which can help detect use of invalidated iterators.
As others have noted, this type of error is tricky to identify without digging into the code and running tests (automated or manual). The more pieces of the system you can pull out and still reproduce it, the better. Divide and conquer is your friend here.
Beyond that, it all depends how important it is for you to resolve this and how much effort you're willing to put in. There are at least three classes of tools that can help with such intermittent problems:
Application monitors that track potential errors as your application runs. These tend to slow your program significantly (10x or more slowdown). Examples include:
Microsoft's Application Verifier
Open-source and cross-platform Dr. Memory
Google's Crashpad. Unlike the previous two programs, this one requires instrumenting your code. It is also (allegedly -- haven't tried it) easier to use with helpers like Backtrace's commercial integration for analyzing Crashpad output
Google's Sanitizers - free and some are built into gcc and clang. There's also a Windows port of Address Sanitizer, but a cursory look suggests it may be a little bit of a second-class citizen.
If you can run and repro it also run it on Linux, you could use valgrind; rr (see this CppCast ep) which is a free extension for gdb that records and replays your program so you can record a debug session that crashed and then step through it to see what went wrong; and/or UndoDB and friends from Undo software, which is a more sophisticated, commercial product like rr.
Static analysis of the code. This is a set of tools that looks for common errors in your code. It generally has a low signal-to-noise ratio, so there are a lot of minor things to dig through if your run it on a large, existing project (best to start a project using these things from the beginning if possible). That said, many of the warnings are invaluable. Examples:
Most compilers have a subset of this functionality built in. If you're using Visual Studio, add /W4 /WX to your compilation flags for the C++ code to force maximum warnings, then fix all the warnings. For gcc and clang, add '-Wall -Wpedantic -Werror` to enforce no warnings.
PVS-Studio (commercial)
PC-Lint (commercial)
If you can instrument the code to write log messages, something like Debugview++ may be of assistance.
Things get harder if you have multithreading going on, which it looks like you do, because the non-determinism gets harder to track, there are new classes of possible errors that are introduced, and some of the above tools won't work well (e.g., I think rr is single-threaded only). Beyond a full IDE like Visual Studio, you'd need to go with something like Intel's Inspector (formerly Intel Thread Checker), or on Linux, Valgrind's Helgrind and DRD and ThreadSanitizer (in the sanitizers above, but also Linux only AFAIK). But hopefully this list gives you a place to start.

C++ Builder XE3 run & crash BEFORE the main()

I have some troubles with Embarcadero C++ Builder XE3. When I run my program, I have an access violation BEFORE the first instruction in the main...So I can't debug, it's very weird.
I used to have this problem a couple of weeks ago : I was forced to full rebuild the entire projet (even if only a comma was missing...) and the violation didn't occurs anymore. I solved it by ckecking the option "Disable incremental link".
I was very happy, but today, the problem is back, and whatever I do, my application crash before enterring in the main ...
Does anyone have an idea ? It's a big project, so I can't really post an exemple because I don't really know what to show...
Thanks a lot
Probably you have a bug in a constructor of a static global object. These constructors are all executed before getting into main(), so this can happen without being a runtime environment or a compiler bug.
As you told, debugging these is difficult as you probably don't know which class is failing, and probably you don't have exception info also.
As you say it's a large project, perhaps you have to resign to use large project toolkits/methodologies to deal with these problems, like unit testing and lean methodologies (like scrum or the like).
With the information you post I think this is the most can be said.

C++ simple crash logging

I'm writing a plugin (basically a dll) for a 3D application and occasionally there are crashes. Sometimes these are very difficult to find and I wanted to invest some time into making (or integrating an existing) crash logger that will
Give me a stack trace.
Give me a list of local variables.
Dump these items to a file, or upload it to a given URL.
So far I've looked at Google breakpad, but have no idea how to integrate it and the documentation seems poor at best. I've tried to use it and managed to get as far as building it on windows, but some unit tests fail and there is no help about what to do at that point. Also, it may be a bit excessive for my needs.
I've found the following site which details how to get a stack trace GENERATE STACK TRACES ON CRASH PORTABLY IN C++. But I'm not sure if this will work on a remote system. I'm guessing this will need to be the debug version and be supplied the pdb file for this to work?
As for getting local variables, I've not managed to find anything yet.
Does anyone know of some resources to help?
this article, though written in 2002, is still relevant to post-mortem debugging. It shows you all the reasons and steps and design required to get it working.
Nowadays, its a little easier (though I liked Windbg!) you get your app to call SetUnhandledExceptionFilter and write the .dmp file, then just double-click it to load it into Visual Studio. You will need good symbols (.pdf files) on the debugging system to make sense of the dump, but create your own symbol server (instructions in the article, its dead easy) and it should be able to figure out which symbols are needed for any app. You have to be disciplined about saving the symbols though - wrong symbols are worse than useless.
This question is rather old, but regarding exceptions I want to note that you can get nice backtraces in a cross-platform way, using only standard C++11 and without the need for a debugger or cumbersome logging:
Use std::nested_exception and std::throw_with_nested
It is described on StackOverflow here and here, how you can get a backtrace on your exceptions inside your code by simply writing a proper exception handler which will rethrow nested exceptions.
It will, however, require that you insert try/catch statements at the functions you wish to trace.
Since you can do this with any derived exception class, you can add a lot of information to such a backtrace!
You may also take a look at my MWE on GitHub or my "trace" library, where a backtrace would look something like this:
Library API: Exception caught in function 'api_function'
Backtrace:
~/Git/mwe-cpp-exception/src/detail/Library.cpp:17 : library_function failed
~/Git/mwe-cpp-exception/src/detail/Library.cpp:13 : could not open file "nonexistent.txt"
Regarding local variables, in this approach you would have to manually place those you wish to print into your exception or exception message in some form, which would be quite cumbersome...
Writing whatever you logged to a file and uploading it somewhere is a different problem for which I don't have any specific suggestions.

Edit and Continue on GDB

I know that E&C is a controversial subject and some say that it encourages a wrong approach to debugging, but still - I think we can agree that there are numerous cases when it is clearly useful - experimenting with different values of some constants, redesigning GUI parameters on-the-fly to find a good look... You name it.
My question is: Are we ever going to have E&C on GDB? I understand that it is a platform-specific feature and needs some serious cooperation with the compiler, the debugger and the OS (MSVC has this one easy as the compiler and debugger always come in one package), but... It still should be doable. I've even heard something about Apple having it implemented in their version of GCC [citation needed]. And I'd say it is indeed feasible.
Knowing all the hype about MSVC's E&C (my experience says it's the first thing MSVC users mention when asked "why not switch to Eclipse and gcc/gdb"), I'm seriously surprised that after quite some years GCC/GDB still doesn't have such feature. Are there any good reasons for that? Is someone working on it as we speak?
It is a surprisingly non-trivial amount of work, encompassing many design decisions and feature tradeoffs. Consider: you are debugging. The debugee is suspended. Its image in memory contains the object code of the source, and the binary layout of objects, the heap, the stacks. The debugger is inspecting its memory image. It has loaded debug information about the symbols, types, address mappings, pc (ip) to source correspondences. It displays the call stack, data values.
Now you want to allow a particular set of possible edits to the code and/or data, without stopping the debuggee and restarting. The simplest might be to change one line of code to another. Perhaps you recompile that file or just that function or just that line. Now you have to patch the debuggee image to execute that new line of code the next time you step over it or otherwise run through it. How does that work under the hood? What happens if the code is larger than the line of code it replaced? How does it interact with compiler optimizations? Perhaps you can only do this on a specially compiled for EnC debugging target. Perhaps you will constrain possible sites it is legal to EnC. Consider: what happens if you edit a line of code in a function suspended down in the call stack. When the code returns there does it run the original version of the function or the version with your line changed? If the original version, where does that source come from?
Can you add or remove locals? What does that do to the call stack of suspended frames? Of the current function?
Can you change function signatures? Add fields to / remove fields from objects? What about existing instances? What about pending destructors or finalizers? Etc.
There are many, many functionality details to attend to to make any kind of usuable EnC work. Then there are many cross-tools integration issues necessary to provide the infrastructure to power EnC. In particular, it helps to have some kind of repository of debug information that can make available the before- and after-edit debug information and object code to the debugger. For C++, the incrementally updatable debug information in PDBs helps. Incremental linking may help too.
Looking from the MS ecosystem over into the GCC ecosystem, it is easy to imagine the complexity and integration issues across GDB/GCC/binutils, the myriad of targets, some needed EnC specific target abstractions, and the "nice to have but inessential" nature of EnC, are why it has not appeared yet in GDB/GCC.
Happy hacking!
(p.s. It is instructive and inspiring to look at what the Smalltalk-80 interactive programming environment could do. In St80 there was no concept of "restart" -- the image and its object memory were always live, if you edited any aspect of a class you still had to keep running. In such environments object versioning was not a hypothetical.)
I'm not familiar with MSVC's E&C, but GDB has some of the things you've mentioned:
http://sourceware.org/gdb/current/onlinedocs/gdb/Altering.html#Altering
17. Altering Execution
Once you think you have found an error in your program, you might want to find out for certain whether correcting the apparent error would lead to correct results in the rest of the run. You can find the answer by experiment, using the gdb features for altering execution of the program.
For example, you can store new values into variables or memory locations, give your program a signal, restart it at a different address, or even return prematurely from a function.
Assignment: Assignment to variables
Jumping: Continuing at a different address
Signaling: Giving your program a signal
Returning: Returning from a function
Calling: Calling your program's functions
Patching: Patching your program
Compiling and Injecting Code: Compiling and injecting code in GDB
This is a pretty good reference to the old Apple implementation of "fix and continue". It also references other working implementations.
http://sources.redhat.com/ml/gdb/2003-06/msg00500.html
Here is a snippet:
Fix and continue is a feature implemented by many other debuggers,
which we added to our gdb for this release. Sun Workshop, SGI ProDev
WorkShop, Microsoft's Visual Studio, HP's wdb, and Sun's Hotspot Java
VM all provide this feature in one way or another. I based our
implementation on the HP wdb Fix and Continue feature, which they
added a few years back. Although my final implementation follows the
general outlines of the approach they took, there is almost no shared
code between them. Some of this is because of the architectual
differences (both the processor and the ABI), but even more of it is
due to implementation design differences.
Note that this capability may have been removed in a later version of their toolchain.
UPDATE: Dec-21-2012
There is a GDB Roadmap PDF presentation that includes a slide describing "Fix and Continue" among other bullet points. The presentation is dated July-9-2012 so maybe there is hope to have this added at some point. The presentation was part of the GNU Tools Cauldron 2012.
Also, I get it that adding E&C to GDB or anywhere in Linux land is a tough chore with all the different components.
But I don't see E&C as controversial. I remember using it in VB5 and VB6 and it was probably there before that. Also it's been in Office VBA since way back. And it's been in Visual Studio since VS2005. VS2003 was the only one that didn't have it and I remember devs howling about it. They intended to add it back anyway and they did with VS2005 and it's been there since. It works with C#, VB, and also C and C++. It's been in MS core tools for 20+ years, almost continuous (counting VB when it was standalone), and subtracting VS2003. But you could still say they had it in Office VBA during the VS2003 period ;)
And Jetbrains recently added it too their C# tool Rider. They bragged about it (rightly so imo) in their Rider blog.

Finding division by zero in a big project

Recently, our big project began crashing on unhandled division by zero. No recent code seems to contain any likely elements so it may be new data sets affecting old code. The problem is the code base is pretty big, and running on an embedded device with no comfortable debug access (debug is done by a lot of printf()s over serial console, there is no gdb for the device and even if there was, the binary compiled with debug symbols wouldn't fit).
The most viable way would likely be to find all the division operations (they are relatively infrequent), and analyze code surrounding each of them to see if any of the divisor variables was left unguarded.
The question is then either how to find all division operations in a big (~200 files, some big) C++ project, or, if you have a better idea how to locate the error, please give them.
extra info: project runs on embedded ARM9, a small custom Linux distro, crosscompiled with Cygwin/Windows crosstools, IDE is Eclipse but there's also Cygwin with all the respective goodies. Thing is the project is very hardware-specific, and the crashes occur only when running at full capacity, all the essential interconnected modules active. Restricted "fault mode" where only bare bones are active doesn't create them.
I think the most direct step, would be to try to catch the unhandled exception and generate a dump or printf stack information or similar.
Take a look at this question or just search in google for info relating to exception catching in your particular environment.
By the way, I think that the division could happen as a result of a call to an external library, so it's not 100% sure that you'll find the culprit just by greping your code.
If I remember right, the ARM9 doesn't have hardware divide so it's going to be implemented in a function call the compiler makes whenever it has to perform a division.
See if your toolset implements the divide by zero handling in the same way as ARM's toolset does (it's likely that it does something at least similar). If so, you can install a handler that gets called when the problem occurs and you can printf() registers and stack so that you can determine where the problem is occurring. A possible similar alternative is that your small Linux distro is throwing a signal you can catch.
I'm not sure how you're getting your information that a divide by zero is occurring, but if it's because the runtime is spitting out a message to that effect, you always have the option of finding out where that is handled in the runtime, and replacing it with your own more informative message. However, I'd guess that there's a more 'architected' way to get your code to run (a signal handler or ARM's technique).
Finding all of the divisions shouldn't be hard with a custom grep search. You can easily distinguish that usage from other usages of the / and % character in C++.
Also, if you know what you are dividing, you could globally overload the / and % operator to have a __FILE__ and __LINE__ informing assertion. If using a makefile, it shouldn't be hard to include the custom operator code in all the linked files without touching the code.
You should use this as an excuse to invest in improving the debug-ability of your device - for both this problem and future issues. Even if you can't get live debugging, you should be able to find a way to generate and save off core dumps for post-mortem debugging (pinpointing the source or any unhandled exception immediately).
PC-Lint might help, it's like Findbugs for C++. It is a commercial product but there is a 30 money back guarantee.
Handle the exception.
Usually the exception will be handed a structure that contains the address that caused the exception and other information. You will probably have to become familiar with the microcontroller's datasheet or RTOS manual.
Use the -save-temps for gcc and find the relevant assembly for division in the generated .s file. If you're lucky it will be something fairly distinctive, possibly even a function call. If it's a function call you can use weak linking to override it with your own checked version. Otherwise locating the divisions in the assembly should give you a very good idea where they are in the C/C++ code and you can instrument them directly.
usually you could modify/override the divide-by-zero exception handler if you have access to the exception handler routines.
in case of ARM, the division is done by a library routine. and there are mechanisms to inform the user-code, when a divide by zero occurs.
see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html
i would suggest to provide a __rt_raise() as said in the page above.
__rt_raise(2,2) will get called when the divide routine detects a divide by zero.
so you can print the LR register.
and then use addr2line to crossref it against the source line
The only way to find those conditions is the usual:
try to reproduce the problem without looking at the source (as the bug already happened you should have info on the part of the program that is affected)
if found, check the source for this point and fix it, otherwise:
2.1. grep for each / not followed by a * or / (grep "/[^/*]" i think)
2.2. find the conditions for which the code is executed and reproduce it
The exception already has the address location of the offending divide by zero code. The CPU saves register contents when a exception occurs including the PC(program counter). Your OS should pass this information along (I assumes that is how you know it is divide by zero). Print the address and go look in your code. If you can print a stack trace it would be even easier to solve.
Another option would be to check the differences in your version control software between the last know working version and the first non working version. This should give you a limmited change set within which to search for the problem.