Unhandled Exception in Rad Studio Debugger Thread - c++

I have a large application that recently started exhibiting rather strange behavior when running in a debugger. First, the basics:
OS: Windows 7 64-bit.
Application: Multithreaded VCL app with many dlls, bpls, and other components.
Compiler/IDE: Embarcadero RAD Studio 2010.
The observed symptom is this: While the debugger is attached to my application, certain tasks cause the application to crash. The particulars are furthermore perplexing: My application stops with a Windows message saying, "YourApplication has stopped working." And it helpfully offers to send a minidump to Microsoft.
It should be noted: the application doesn't crash when the debugger is not attached. Also, the debugger doesn't indicate any exceptions or other issues while the application is running.
Setting and stepping through breakpoints seems to affect the point at which the application crashes, but I suspect that is a symptom of debugging a thread other than the problematic one.
These crashes also occur on the computers on my colleagues, with the same behavior I observe. This leads me to not suspect a failed installation of something on my computer particularly. My colleagues experiencing the issue are also running Windows 7 64-bit. I have no colleagues not experiencing the issue.
I've collected an analyzed a number of full dumps from the crashes. I discovered that the failure was actually happening in the same place each time. Here is the exception data from the dumps (it is always the same, except of course the ThreadId):
Exception Information
ThreadId: 0x000014C0
Code: 0x4000001F Unknown (4000001F)
Address: 0x773F2507
Flags: 0x00000000
NumberParameters: 0x00000001
0x00000000
Google reveals that Code 0x4000001F is actually STATUS_WX86_BREAKPOINT. Microsoft unhelpfully describes it as "An exception status code that is used by the Win32 x86 emulation subsystem."
Here are the stack details (which don't seem to vary):
0x773F2507: ntdll.dll+0x000A2507: RtlQueryCriticalSectionOwner + 0x000000E8
0x773F3DAB: ntdll.dll+0x000A3DAB: RtlQueryProcessLockInformation + 0x0000020D
0x773D2ED9: ntdll.dll+0x00082ED9: RtlUlonglongByteSwap + 0x00005C69
0x773F3553: ntdll.dll+0x000A3553: RtlpQueryProcessDebugInformationRemote + 0x00000044
0x74F73677: kernel32.dll+0x00013677: BaseThreadInitThunk + 0x00000012
0x77389F02: ntdll.dll+0x00039F02: RtlInitializeExceptionChain + 0x00000063
0x77389ED5: ntdll.dll+0x00039ED5: RtlInitializeExceptionChain + 0x00000036
It is worth noting that there appears to be a function epilog at 0x773F24ED, which rather suggests that the RtlQueryCriticalSectionOwner is a red herring. Likewise, a function epilog casts doubt on RtlQueryProcessLockInformation. The 0x5C69 offset casts doubt on the RtlUlonglongByteSwap. The other symbols look legit, though.
Specifically, RtlpQueryProcessDebugInformationRemote looks legitimate. Some people on the internet (http://www.cygwin.com/ml/cygwin-talk/2006-q2/msg00050.html) seem to think that it is created by the debugger to collect debug information. That theory seems sound to me, since it only seems to appear when the debugger is attached.
As always, when something breaks, something changed that broke it. In this case, that something is dynamically loading a new dll. I can cause the crash to stop happening by not dynamically loading the particular dll. I'm not convinced that the dll loading is related, but here are the details, just in case:
The dll source is C. Here are the compile options that are not set to the default:
Language Compliance: ANSI
Merge duplicate strings: True
Read-only strings: True
PCH usage: Do not use
Dynamic RTL: False
(The Project Options say False is default for Dynamic RTL, though it was set to True when I created the dll project.)
The dll is loaded with LoadLibrary and freed with FreeLibrary. All seems to be fine with the loading and unloading of the module. However, shortly after the library is unloaded (with FreeLibrary), the aforementioned thread crashes the program. For debugging, I removed all actual calls to the library (including, for more testing, DllMain). No combination of calls or not calls, DllMain or no DllMain, or anything else seemed to alter the behavior of the crash in any way. Simply loading and unloading the dll invokes the crash later on.
Furthermore, changing the dll to use the Dynamic RTL also causes the debugger thread crash to cease. This is undesirable because the compiled dll really should be usable without CodeGear Runtime available. Also, dll size is important. The C code contained in the dll does not make use of any libraries. (It includes no headers, even standard library headers. No malloc/free, no printf, no nothin'. It contains only functions that depend exclusively on their inputs and do not require dynamic allocation.) It is also undesirable because "fixing" a bug by changing stuff until it works without understanding why it works is really never a good plan. (It tends to lead to bug recurrences and strange coding practices. But really, at this point, if I can't find anything else, I may admit defeat on this count.)
And finally, my issue may be related to one of these issues:
System error after debugging multithreaded applications
Program and debugger quit without indication of problem
Any ideas or suggestions would be appreciated.

I solved the above mentioned problem by using a modified version of the PatchINT3 workaround, which was published in 2007 for BDS 2006:
procedure PatchINT3;
const
INT3: Byte = $CC;
NOP: Byte = $90;
var
NTDLL: THandle;
BytesWritten: DWORD;
Address: PByte;
begin
if Win32Platform <> VER_PLATFORM_WIN32_NT then
Exit;
NTDLL := GetModuleHandle('NTDLL.DLL');
if NTDLL = 0 then
Exit;
Address := GetProcAddress(NTDLL, 'RtlQueryCriticalSectionOwner');
if Address = nil then
Exit;
Inc(Address, $E8);
try
if Address^ <> INT3 then
Exit;
if WriteProcessMemory(GetCurrentProcess, Address, #NOP, 1, BytesWritten)
and (BytesWritten = 1) then
FlushInstructionCache(GetCurrentProcess, Address, 1);
except
//Do not panic if you see an EAccessViolation here, it is perfectly harmless!
on EAccessViolation do
;
else
raise;
end;
end;
Call this routine once after you have loaded the DLL in your thread. The patch fixes a user breakpoint in ntdll.dll version 6.1.7601.17725 and changes it to a NOP.
If there is no user breakpoint (INT3 (=$CC) opcode) at the expected address, the patch routine does nothing and exits.
Hope that helps,
Andreas
Footnote
The original source of PatchINT3 can be found here:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.non-technical/2007-01/msg04431.html
Footnote2
The same function in C++:
void PatchINT3()
{
unsigned char INT3 = 0xCC;
unsigned char NOP = 0x90;
if (Win32Platform != VER_PLATFORM_WIN32_NT)
{
return;
}
HMODULE ntdll = GetModuleHandle(L"NTDLL.DLL");
if (ntdll == NULL)
{
return;
}
unsigned char *address = (unsigned char*)GetProcAddress(ntdll,
"RtlQueryCriticalSectionOwner");
if (address == NULL)
{
return;
}
address += 0xE8;
try
{
if (*address != INT3)
{
return;
}
unsigned long bytes_written = 0;
if (WriteProcessMemory(GetCurrentProcess(), address, &NOP, 1,
&bytes_written) && (bytes_written == 1))
{
FlushInstructionCache(GetCurrentProcess, address, 1);
}
}
catch (EAccessViolation &e)
{
//Do not panic if you see an EAccessViolation
//here, it is perfectly harmless!
}
catch(...)
{
throw;
}
}

Just an idea...
Perhaps you need to close in on the crashing thread. The state that you are observing seems to be a bit after the actual error.
First, your stack traces seems incomplete to me. What is the base root of the stack of that thread? What was the origin of that thread?
And, in the VS debugger there is the possibility to break on exceptions, (Debug->Exceptions...->[Add]). Then all threads will freeze at the moment the exception occurs. I dunno about RAD, but the trick to do it programatically seems to be WaitForDebugEvent().
I could be wrong but I think there is a fair chance that the bug is in the debugger not your code. In that case an uggly workaround is IMHO fully forgivable. Good luck!

I cannot answer this because I cannot see the code...
But...
1) In Borland C++ at least with C++ from BDS there can be a provable problem with the realloc function in multithreaded library. Is your C++ code using realloc?
2) The stack you are showing is more than likely being called as a result of your code actually hitting a "CALL BAD_ADRESS" and that can happen as a result of a bug in your own code. In other words, in the DLL you load, there is likely a function that is doing something that is overwriting the executable code in your program with junk, and then when the now junk section runs, it crashes.
Another way is if something in the C++ dll is modifying the stack below where it runs, and then your code is hitting that later.
3) Check out the CPU flags settings for your DLL. Borland libraries sometimes use conflicting CPU flags upon entry and you might need to save and restore before calling into the DLL. For example if you call a VST plugin made with C++ from Delphi and do not set the flags properly, you can get subsequent divide by zero errors from the VST plugin which was compiled with that exception turned off.

We had the same issue today. In our case the crash happens if having a breakpoint after call to TOpenDialog->Execute() (which is using dialog from shell32.dll I think) (Windows 7 x64, C++ Builder XE2)
After uninstalling iCloud (v2.1.0.39), the issue was resoved.
Unfortunately we are still looking into similar issue our customers are having some times with our release product under Windows Vista. After selecting the file using TOpenDialog, the application crashes in gdiplus.dll with Access violation, removing the iCloud seems to also resolve the issue.

Related

InitializeCriticalSection works in one project, but fails in another

Using Visual Studio 2019 Professional on Windows 10 x64. I have several C++ DLL projects, some of which are multi-threaded. I'm using CRITICAL_SECTION objects for thread safety.
In DLL1:
CRITICAL_SECTION critDLL1;
InitializeCriticalSection(&critDLL1);
In DLL2:
CRITICAL_SECTION critDLL2;
InitializeCriticalSection(&critDLL2);
When I use critDLL1 with EnterCriticalSection or LeaveCriticalSection everything is fine in both _DEBUG or NDEBUG mode. But when I use critDLL2, I get an access violation in 'ntdll.dll' in NDEBUG (though not in _DEBUG).
After popping up message boxes in NDEBUG mode, I was eventually able to track the problem down to the first use of EnterCriticalSection.
What might be causing the CRITICAL_SECTION to fail in one project but work in others? The MSDN page was not helpful.
UPDATE 1
After comparing project settings of DLL1 (working) and DLL2 (not working), I've accidentally got DLL2 working. I've confirmed this by reverting to an earlier version (which crashes) and then making the project changes (no crash!).
This is the setting:
Project Properties > C/C++ > Optimization > Whole Program Optimization
Set this to Yes (/GL) and my program crashes. Change that to No and it works fine. What does the /GL switch do and why might it cause this crash?
UPDATE 2
The excellent answer from #Acorn and comment from #RaymondChen, provided the clues to track down and then resolve the issue. There were two problems (both programmer errors).
PROBLEM 1
The assumption of Whole Program Optimzation (wPO) is the MSVC compiler is compiling "the whole program". This is an incorrect assumption for my DLL project which internally consumes a 3rd party library and is in turn consumed by an external application written in Delphi. This setting is set to Yes (/GL) by default but should be No. This feels like a bug in Visual Studio, but in any case, the programmer needs to be aware of this. I don't know all the details of what WPO is meant to do, but at least for DLLs meant to be consumed by other applications, the default should be changed.
PROBLEM 2
Serious programmer error. It was a call into a 3rd party library, which returned a 128-byte ASCII code which was the error:
// Before
// m_config::acSerial defined as "char acSerial[21]"
(void) m_pLib->GetPara(XPARA_PRODUCT_INFO, &m_config.acSerial[0]);
EnterCriticalSection(&crit); // Crash!
// After
#define SERIAL_LEN 20
// m_config::acSerial defined as "char acSerial[SERIAL_LEN+1]"
//...
char acSerial[128];
(void) m_pLib->GetPara(XPARA_PRODUCT_INFO, &acSerial[0]);
strncpy(m_config.acSerial, acSerial, max(SERIAL_LEN, strlen(acSerial)));
EnterCriticalSection(&crit); // Works!
The error, now obvious, is that the 3rd party library did not copy the serial number of the device into the char* I provided...it copied 128 bytes into my char* stomping over everything contiguous in memory after acSerial. This wasn't noticed before because m_pLib->GetPara(XPARA_PRODUCT_INFO, ...) was one of the first calls into the 3rd party library and the rest of the contiguous data was mostly NULL at that point.
The problem was never to do with the CRITICAL_SECTION. My thanks for Acorn and RaymondChen ... sanity has been restored to this corner of the universe.
If your program crashes under WPO (an optimization that assumes that whatever you are compiling is the entire program), it means that either the assumption is incorrect or that the optimizer ends up exploiting some undefined behavior that previously didn't (without the optimization applied), even if the assumption is correct.
In general, avoid enabling optimizations unless you are really sure you know you meet their requirements.
For further analysis, please provide a MRE.

Exception Handling in /clr MFC Application (compiled with /EHa)

We have a large MFC application that is in the process of being updated. It has been modified to add in some .NET components (some of the DLLs now have managed & native classes), the idea is that the old code will be phased out eventually.
The app has many DLLs with native C++ classes exported, and used by the application.
On trying to test the app we now find ANY exception seems to cause the app to crash, even though these exceptions are in theory being caught in the same function.
example:
CString AddressClass::GetPostalAddress()
{
CString address;
try {
address = (LPCSTR)(_bstr_t)m_pCommand->GetParameters()->Item["PostalAddress"]->Value;
}
catch ( _com_error& )//exception occurs if postal address is NULL
{
address = _T("");
}
return address;
}
The _com_error is not being caught when compiling with /clr and /EHa (Vs2015 update 3). Without /clr it works fine and has been operational for years. From the documentation I have read my understanding is this should work, but clearly I am mistaken.
The code responsible for generating the error is in comutil.h:
inline void CheckError(HRESULT hr)
{
if (FAILED(hr)) {
_com_issue_error(hr);
}
}
The information from the debugger is:
An exception of type 'System.Runtime.InteropServices.SEHException' occurred in XXX.dll and wasn't handled before a managed/native boundary
Additional information: External component has thrown an exception.
Is there anyway to get this working without rewriting huge amounts of code?
Thanks.
Answer from short comment, since it seems to have helped:
Generally speaking: You should not compile these with /clr. Separate your old code vs your new CLR-using code, just compile new code with /clr, compile old code/files directly native. You can still link everything together in a clr-enabled DLL.
What we do is to compile all the native stuff in separate static LIB projects, and then link these into /clr enabled projects - having separate compiler switches for individual source files inside one project is always a tad confusing. But then, it may still be the way to go for you. Depends on the mixture and calling patterns, I guess.
I'll add that I view /clr mostly as a rather powerful glue system, than something you should target for the full app stack.

AccessViolationException reading memory allocated in C++ application from C++/CLI DLL

I have a C++ client to a C++/CLI DLL, which initializes a series of C# dlls.
This used to work. The code that is failing has not changed. The code that has changed is not called before the exception is thrown. My compile environment has changed, but recompiling on a machine with an environment similar to my old one still failed. (EDIT: as we see in the answer this is not entirely true, I was only recompiling the library in the old environment, not the library and client together. The client projects had been upgraded and couldn't easily go back.)
Someone besides me recompiled the library, and we started getting memory management issues. The pointer passed in as a String must not be in the bottom 64K of the process's address space. I recompiled it, and all worked well with no code changes. (Alarm #1) Recently it was recompiled, and memory management issues with strings re-appeared, and this time they're not going away. The new error is Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
I'm pretty sure the problem is not located where I see the exception, the code didn't change between the successful and failing builds, but we should review that to be complete. Ignore the names of things, I don't have much control over the design of what it's doing with these strings. And sorry for the confusion, but note that _bridge and bridge are different things. Lots of lines of code missing because this question is already too long.
Defined in library:
struct Config
{
std::string aye;
std::string bee;
std::string sea;
};
extern "C" __declspec(dllexport) BridgeBase_I* __stdcall Bridge_GetConfiguredDefaultsImplementationPointer(
const std::vector<Config> & newConfigs, /**< new configurations to apply **/
std::string configFolderPath, /**< folder to write config files in **/
std::string defaultConfigFolderPath, /**< folder to find default config files in **/
std::string & status /**< output status of config parse **/
);
In client function:
GatewayWrapper::Config bridge;
std::string configPath("./config");
std::string defaultPath("./config/default");
GatewayWrapper::Config gwtransport;
bridge.aye = "bridged.dll";
bridge.bee = "1.0";
bridge.sea = "";
configs.push_back(bridge);
_bridge = GatewayWrapper::Bridge_GetConfiguredDefaultsImplementationPointer(configs, configPath, defaultPath, status);
Note that call to library that is crashing is in the same scope as the vector declaration, struct declaration, string assignment and vector push-back
There are no threading calls in this section of code, but there are other threads running doing other things. There is no pointer math here, there are no heap allocations in the area except perhaps inside the standard library.
I can run the code up to the Bridge_GetConfiguredDefaultsImplementationPointer call in the debugger, and the contents of the configs vector look correct in the debugger.
Back in the library, in the first sub function, where the debugger don't shine, I've broken down the failing statement into several console prints.
System::String^ temp
List<CConfig^>^ configs = gcnew List<CConfig ^>((INT32)newConfigs.size());
for( int i = 0; i< newConfigs.size(); i++)
{
std::cout << newConfigs[i].aye<< std::flush; // prints
std::cout << newConfigs[i].aye.c_str() << std::flush; // prints
temp = gcnew System::String(newConfigs[i].aye.c_str());
System::Console::WriteLine(temp); // prints
std::cout << "Testing string creation" << std::endl; // prints
std::cout << newConfigs[i].bee << std::flush; // crashes here
}
I get the same exception on access of bee if I move the newConfigs[i].bee out above the assignment of temp or comment out the list declaration/assignment.
Just for reference that the std::string in a struct in a vector should have arrived at it's destination ok
Is std::vector copying the objects with a push_back?
std::string in struct - Copy/assignment issues?
http://www.cplusplus.com/reference/vector/vector/operator=/
Assign one struct to another in C
Why this exception is not caught by my try/catch
https://stackoverflow.com/a/918891/2091951
Generic AccessViolationException related questions
How to handle AccessViolationException
Programs randomly getting System.AccessViolationException
https://connect.microsoft.com/VisualStudio/feedback/details/819552/visual-studio-debugger-throws-accessviolationexception
finding the cause of System.AccessViolationException
https://msdn.microsoft.com/en-us/library/ms164911.aspx
Catching access violation exceptions?
AccessViolationException when using C++ DLL from C#
Suggestions in above questions
Change to .net 3.5, change target platform - these solutions could have serious issues with a large mult-project solution.
HandleProcessCorruptedStateExceptions - does not work in C++, this decoration is for C#, catching this error could be a very bad idea anyway
Change legacyCorruptedStateExceptionsPolicy - this is about catching the error, not preventing it
Install .NET 4.5.2 - can't, already have 4.6.1. Installing 4.6.2 did not help. Recompiling on a different machine that didn't have 4.5 or 4.6 installed did not help. (Despite this used to compile and run on my machine before installing Visual Studio 2013, which strongly suggests the .NET library is an issue?)
VSDebug_DisableManagedReturnValue - I only see this mentioned in relation to a specific crash in the debugger, and the help from Microsoft says that other AccessViolationException issues are likely unrelated. (http://connect.microsoft.com/VisualStudio/feedbackdetail/view/819552/visual-studio-debugger-throws-accessviolationexception)
Change Comodo Firewall settings - I don't use this software
Change all the code to managed memory - Not an option. The overall design of calling C# from C++ through C++/CLI is resistant to change. I was specifically asked to design it this way to leverage existing C# code from existing C++ code.
Make sure memory is allocated - memory should be allocated on the stack in the C++ client. I've attempted to make the vector be not a reference parameter, to force a vector copy into explicitly library controlled memory space, did not help.
"Access violations in unmanaged code that bubble up to managed code are always wrapped in an AccessViolationException." - Fact, not a solution.
but it was the mismatch, not the specific version that was the problem
Yes, that's black letter law in VS. You unfortunately just missed the counter-measures that were built into VS2012 to turn this mistake into a diagnosable linker error. Previously (and in VS2010), the CRT would allocate its own heap with HeapAlloc(). Now (in VS2013), it uses the default process heap, the one returned by the GetProcessHeap().
Which is in itself enough to trigger an AVE when you run your app on Vista or higher, allocating memory from one heap and releasing it from another triggers an AVE at runtime, a debugger break when you debug with the Debug Heap enabled.
This is not where it ends, another significant issue is that the std::string object layout is not the same between the versions. Something you can discover with a little test program:
#include <string>
#include <iostream>
int main()
{
std::cout << sizeof(std::string) << std::endl;
return 0;
}
VS2010 Debug : 32
VS2010 Release : 28
VS2013 Debug : 28
VS2013 Release : 24
I have a vague memory of Stephen Lavavej mentioning the std::string object size reduction, very much presented as a feature, but I can't find it back. The extra 4 bytes in the Debug build is caused by the iterator debugging feature, it can be disabled with _HAS_ITERATOR_DEBUGGING=0 in the Preprocessor Definitions. Not a feature you'd quickly want to throw away but it makes mixing Debug and Release builds of the EXE and its DLLs quite lethal.
Needless to say, the different object sizes seriously bytes when the Config object is created in a DLL built with one version of the standard C++ library and used in another. Many mishaps, the most basic one is that the code will simply read the Config::bee member from the wrong offset. An AVE is (almost) guaranteed. Lots more misery when code allocates the small flavor of the Config object but writes the large flavor of std::string, that randomly corrupts the heap or the stack frame.
Don't mix.
I believe 2013 introduced a lot of changes in the internal data formats of STL containers, as part of a push to reduce memory usage and improve perf. I know vector became smaller, and string is basically a glorified vector<char>.
Microsoft acknowledges the incompatibility:
"To enable new optimizations and debugging checks, the Visual Studio
implementation of the C++ Standard Library intentionally breaks binary
compatibility from one version to the next. Therefore, when the C++
Standard Library is used, object files and static libraries that are
compiled by using different versions can't be mixed in one binary (EXE
or DLL), and C++ Standard Library objects can't be passed between
binaries that are compiled by using different versions."
If you're going to pass std::* objects between executables and/or DLLs, you absolutely must ensure that they are using the same version of the compiler. It would be well-advised to have your client and its DLLs negotiate in some way at startup, comparing any available versions (e.g. compiler version + flags, boost version, directx version, etc.) so that you catch errors like this quickly. Think of it as a cross-module assert.
If you want to confirm that this is the issue, you could pick a few of the data structures you're passing back and forth and check their sizes in the client vs. the DLLs. I suspect your Config class above would register differently in one of the fail cases.
I'd also like to mention that it is probably a bad idea in the first place to use smart containers in DLL calls. Unless you can guarantee that the app and DLL won't try to free or reallocate the internal buffers of the other's containers, you could easily run into heap corruption issues, since the app and DLL each have their own internal C++ heap. I think that behavior is considered undefined at best. Even passing const& arguments could still result in reallocation in rare cases, since const doesn't prevent a compiler from diddling with mutable internals.
You seem to have memory corruption. Microsoft Application Verifier is invaluable in finding corruption. Use it to find your bug:
Install it to your dev machine.
Add your exe to it.
Only select Basics\Heaps.
Press Save. It doesn't matter if you keep application verifier open.
Run your program a few times.
If it crashes, debug it and this time, the crash will point to your problem, not just some random location in your program.
PS: It's a great idea to have Application Verifier enabled at all times for your development project.

Why is passing a char* to this method failing?

I have a C++ method such as:
bool MyClass::Foo(char* charPointer)
{
return CallExternalAPIFunction(charPointer);
}
Now I have some static method somewhere else such as:
bool MyOtherClass::DoFoo(char* charPointer)
{
return _myClassObject.Foo(charPointer);
}
My issue is that my code breaks at that point. It doesn't exit the application or anything, it just never returns any value. To try and pinpoint the issue, I stepped through the code using the Visual Studio 2010 debugger and noticed something weird.
When I step into the DoFoo function and hover over charPointer, I actually see the value it was called with (an IP address string in this case). However, when I step into Foo and hover over charPointer, nothing shows up and the external API function call never returns (it's like it's just stepped over) and my program resumes it's execution after the call to DoFoo.
I also tried using the Exception... feature of the VS debugger (to pick up first chance exceptions) but it never picked up anything.
Has this ever happened to anyone? Am I doing something wrong?
Thank you.
You need to build the project with Debug settings. Release settings mean that optimizations are enabled and optimizations make debugging a beating.
Without optimizations, there is a very close correspondence between statements in your C++ code and blocks of machine code in the program. The program is slower (often far slower) but it's easier to debug because you can observe what each statement does.
The optimizer reorders your code, eliminates variables, inlines functions, unrolls loops, and does all sorts of other things to make the program fast. The program is faster (often much faster) but it's far more difficult to debug because the correspondence between the statements in your C++ code and the instructions in the machine code is no longer there.

Crashing C++ application including ada dll does not generate core dump

How do I get a C++ application including a loaded ada shared library to generate a core dump when crashing?
I have a C++ application which loads a ada shared library, inside the ada code I get a stack overflow error which causes program termination along with the console output:
raised STORAGE ERROR
No core dump file is generated even thou I have issued a "ulimit -c unlimited" before starting the application.
Same thing happens if I send a kill SIGSEGV to the application.
Sending kill SIGSEGV to another application that does not use the ada dll generates a core dump file just the way I want it to.
Found some information here: http://objectmix.com/ada/301203-gnat-fstack-check-does-work.html
UPDATED!
As mentioned by Adrien, there is no contradiction, -s sets the stack limit while -c sets the core file limit.
Still the problem remains. I checked the flags when building the ada library and the fstack-check flag was not set, so it should generate a core dump.
Althou I haven't tried it yet, it seems somewhat strange.
It mentions the -fstack-check compiler option + setting the GNAT_STACK_LIMIT variable but at the same time refers to the ulimit command which seems like a contradiction, setting "ulimit -c " is the only way I know of getting a core dump to be generated at the time of crash, if this infers with the fstack-check option then we have a catch 22.
Now, almost 2 years later (still working at the same company as Kristofer did when he asked the question), was the question raised again - and finally I think that I understands why no core-dump is generated!!
The problem is caused by the Ada run-time, which by default implements a signal handler for some POSIX-signals (for Linux: SIGABRT, SIGFPE, SIGILL, SIGSEGV and SIGBUS). For GNAT/linux the signal handler is called __gnat_error_handler in a-init.c, which looks something like this:
static void
__gnat_error_handler (int sig)
{
struct Exception_Data *exception;
char *msg;
static int recurse = 0;
...
switch (sig)
{
case SIGSEGV:
if (recurse)
{
exception = &constraint_error;
msg = "SIGSEGV";
}
else
{
...
msg = "stack overflow (or erroneous memory access)";
exception = &storage_error;
}
break;
}
recurse = 0;
Raise_From_Signal_Handler (exception, msg);
}
This handler is "process wide", and will be called by any trigged signal, no matter from which part of the process it originates from (no matter if coded in Ada/C/C++...).
When called, the handler rises an Ada-exception and leaves it to the Ada runtime to find an appropriate exception handler - if no such handler is found (eg. when an SIGSEGV is generated by any part of the C++-code), the Ada-runtime falls back to just terminate the process and just leave a simple printout from __gnat_error_handler (eg. "stack overflow (or erroneous memory access)").
http://www2.adacore.com/gap-static/GNAT_Book/html/node25.htm
To prevent Ada-runtime from handling a POSIX-signal, and convert it to an Ada-exception, it is possible to disable the default beahviour by using
pragma Interrupt_State (Name => value, State => SYSTEM | RUNTIME | USER);,
eg. to disable handling of SIGSEGV, define
Pragma Interrupt_State(SIGSEGV, SYSTEM);
in your Ada-code - now the system's default behaviour will be trigged when a SIGSEGV is raised, and a core-dump will be generated that allows you to trace down the origin of the problem!
I think this is a quite important issue to be aware of when mixing Ada and C/C++ on *NIX-platforms, since it may mislead you to think that problems origins from the Ada-code(since the printout indicates an exception generated from Ada) when the real source of the problem lays in the C/C++-code...
Although it is probably safe to disable the Ada-runtime default handling of SIGSEGV (I guess no sane programmer using this in any "expected" error handling... Well, maybe used in aviation software or similar, when some sort of "last resort" functionallity needs to be maintained to avoid something really bad from happening..) i guess a bit caution should be taken then "overriding" the Ada-runtime handling for signals.
One issue may be the SIGFPE-signal, which also raises an Ada Constraint_Error-exception by default. This type of exception may be used by the Ada-code as an "excpected behaviour".
Disabling SIGFPE by Pragma Interrupt_State may seriously affect the execution of the Ada-code, and crash your application during "normal circumstances" - on the other hand will any division by zero in the C/C++-code trig the Ada-exception handling mechanism, and leave you without any real trace of the origin of the problem...
This looks to me like a really good use for your AdaCore support. You aren't liable to find a whole lot of folk outside that company who are that familiar with the implications of the interactions between Gnu Ada's runtime and C++'s.
I would suggest for debugging the Ada code that you try putting in a last-ditch exception handler around everything, which in turn dumps the exception stack. Most vendors have some way of doing that, usually based off of Ada.Exceptions.Exception_Information and Ada.Exceptions.Exception_Message.
I found a discussion from a security perspective (finding malware). Basically there are 10 signals that you can try, SIGSEGV is only one of them.
It seems you can simply call sigaction(SIGSEGV, 0, SIG_DFL); to restore the default signal behavior.