C++ exceptions and the .eh_frame ELF section - c++

Is it that the absence or damage of the .eh_frame ELF section is the cause of exceptions in my C++ code stopped working? Any exception that previously was caught successfully is now calling std::terminate().
My situation:
My zzz.so shared library has try-catch blocks:
try {
throw Exc();
} catch (const Exc &e) {
LOG("ok " << e.what());
} catch (...) {
LOG("all");
}
An executable which loads the zzz.so (using ldopen). It call a function in the zzz.so
All the exceptions thrown in the zzz.so are successfully caught inside zzz.so and dumped into my log file
There is another aaa.so that is loaded into another binary. That another aaa.so is loading my zzz.so.
All the same exceptions thrown in the zzz.so lead to call std::terminate().
How is that possible?
update
I don't know HOW is that possible still, but Clang 3.3 (FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610) solved the problem.

How is that possible?
When an exception is thrown, control passes to __cxa_throw routine (usually in libstdc++.so), which is then responsible for finding the catch clause and calling destructors along the way, or calling std::terminate if no catch is found.
The answer then is most likely that the first executable (the one where exceptions work) uses libstdc++.so that is capable of decoding .eh_frame in your library, while the second application (the one where exceptions do not work), either uses an older (incompatible) version of libstdc++.so, or links against libstdc++.a, or something along these lines.
Note: the actual work of raising the exception is done by _Unwind_RaiseException in libgcc_s.so.1, so even if both applications use the same libstdc++.so, they may still use different libgcc.
Update:
Will I benefit from static linking libstdc++ and libgcc into my .so library?
Maybe. TL;DR: it's complicated.
There are a few things to consider:
On any platform other than i386, you would have to build your own copy of libstdc++.a and libgcc.a with -fPIC before you can link them into your zzz.so. Normally these libraries are built without -fPIC, and can't be statically linked into any .so.
Static linking of libstdc++.a into your zzz.so may make it a derived work, and subject to GPL (consult your lawyer).
Even when there is a _Unwind_RaiseException exported from zzz.so, normally there will already be another instance of _Unwind_RaiseException defined in (loaded earlier) libgcc_s.so, and that earlier instance is the one that will be called, rendering your workaround ineffective. To make sure that your copy of _Unwind_RaiseException is called, you would need to link zzz.so with -Bsymbolic, or with a special linker script to make all calls to _Unwind_RaiseException (and everything else from libgcc.a) internal.
Your workaround may fix the problem for zzz.so, but may cause a problem for unrelated yyy.so that is loaded even later, and that wants the system-provided _Unwind_RaiseException, not the one from zzz.so. This is another argument for hiding all libgcc.a symbols and making them internal to zzz.so.
So the short answer is: such workaround is somewhat likely to cause you a lot of pain.

Related

"undefined reference to __dso_handle" while linking static library with -nostdlib [duplicate]

I have an unresolved symbol error when trying to compile my program which complains that it cannot find __dso_handle. Which library is this function usually defined in?
Does the following result from nm on libstdc++.so.6 mean it contains that?
I tried to link against it but the error still occurs.
nm libstdc++.so.6 | grep dso
00000000002fc480 d __dso_handle
__dso_handle is a "guard" that is used to identify dynamic shared objects during global destruction.
Realistically, you should stop reading here. If you're trying to defeat object identification by messing with __dso_handle, something is likely very wrong.
However, since you asked where it is defined: the answer is complex. To surface the location of its definition (for GCC), use iostream in a C++ file, and, after that, do extern int __dso_handle;. That should surface the location of the declaration due to a type conflict (see this forum thread for a source).
Sometimes, it is defined manually.
Sometimes, it is defined/supplied by the "runtime" installed by the compiler (in practice, the CRT is usually just a bunch of binary header/entry-point-management code, and some exit guards/handlers). In GCC (not sure if other compilers support this; if so, it'll be in their sources):
Main definition
Testing __dso_handle replacement/tracker example 1
Testing __dso_handle replacement/tracker example 2
Often, it is defined in the stdlib:
Android
BSD
Further reading:
Subtle bugs caused by __dso_handle being unreachable in some compilers
I ran into this problem. Here are the conditions which seem to reliably generate the trouble:
g++ linking without the C/C++ standard library: -nostdlib (typical small embedded scenario).
Defining a statically allocated standard library object; specific to my case is std::vector. Previously this was std::array statically allocated without any problems. Apparently not all std:: statically allocated objects will cause the problem.
Note that I am not using a shared library of any type.
GCC/ARM cross compiler is in use.
If this is your use case then merely add the command line option to your compile/link command line: -fno-use-cxa-atexit
Here is a very good link to the __dso_handle usage as 'handle to dynamic shared object'.
There appears to be a typo in the page, but I have no idea who to contact to confirm:
After you have called the objects' constructor destructors GCC automatically calls the function ...
I think this should read "Once all destructors have been called GCC calls the function" ...
One way to confirm this would be to implement the __cxa_atexit function as mentioned and then single step the program and see where it gets called. I'll try that one of these days, but not right now.
Adding to #natersoz's answer-
For me, using -Wabi-tag -D_GLIBCXX_USE_CXX11_ABI=0 alongside -fno-use-cxa-atexit helped compile an old lib. A telltale is if the C++ functions in the error message have std::__cxx11 in them, due to an ABI change.

std::thread weak when using -static-libstdc++, thus causing crash at runtime

I need to build a portable shared object, which is a plugin for another software on Linux. I did some amount of reading on the subject, came down to the conclusion, that I should build a sysrooted gcc (gcc 5.4.0 if it matters) with a decently old glibc (to provide compatibility with older systems), link with -static-libstdc++ and -static-libgcc thus arriving to a point where I have something that only depends on the hosts glibc and some other minor stuff which will always be present.
Now, I did all that and now I am experiencing a weird crash - segmentation fault happens in a place where the code calls std::thread, and gdb actually shows that the stack frame is inside libstdc++.so.6 (where is shouldn't be, ldd of my shared object also does not list libstdc++.so). The top of the stack at the crash is:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff79075e3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 # THIS SHOULD NOT BE HERE RIGHT?
#2 0x00007ffff5a25a5c in std::thread::thread<void (ReferenceAnalytics::*)(std::timed_mutex&), ReferenceAnalytics*&, std::reference_wrapper<std::timed_mutex> >
(this=0x7fffffffcf40, __f=
#0x7fffffffcf60: (void (ReferenceAnalytics::*)(ReferenceAnalytics * const, std::timed_mutex &)) 0x7ffff5a1750c <ReferenceAnalytics::WorkerThreadMethod(std::timed_mutex&)>)
at /home/developer/Toolchains/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/include/c++/5.4.0/thread:137 # Looks like my toolchain
So, I did some reading, and then using nm discovered that my shared object has all the std::thread stuff like ctor, dtor, swap, .... defined as weak symbols (which I assume causes a collision if the host that loads the plugin uses dynamic libstdc++ and then my calls are routed there and all hell breaks loose, is this right?).
My further attempts of googling and reading did not give me an answer to how can I control this as in force the std::thread stuff to be resolved to the static libstdc++ in my sysrooted gcc?
More over, I made a small executable that just does dlopen on my shared object and then calls a method which internally constructs the thread - if the executable is also built with -static-libstdc++ all is well, if not, the crash happens. So I assume my theory about the weak symbol for std::thread being resolved to the hosts libstdc++ is correct, but how to solve this?
If you statically link a DSO against libstdc++ without hiding the libstdc++ symbols, and the main program is linked against libstdc++ as well, then the symbol definitions in the main program will interpose/preempt the definitions in the DSO when it is opened with dlopen.
However, because the main program is not linked against libpthread, the the system libstdc++ DSO in the process image saw that the libpthread symbols were unavailable (null), and thus disabled thread support. However, your DSO needs thread support, but can't get it from the system libstdc++.
As an immediate workaround, you can hide all the statically linked libstdc++ symbols in the DSO. Then no interposition will take place, and your DSO will actually use the libstdc++ copy in the DSO itself, which has already established that there should not be any thread support in the process.
But this will likely not solve all of your problems because late loading of libpthread via dlopen has its problems. We fixed one bug here:
Segfault after a binary without pthread dlopen()s a library linked with pthread
But your distribution may not have that fix, and I expect there will be other issues, one of them being: The second, statically linked copy of libstdc++ is actually needed here because the system libstdc++ has been loaded without thread support (because libpthread was not loaded when its symbols were bound, causing the crash you observed), so you cannot use it for creating threads. It also has activated optimizations which make the library not thread safe (avoid atomic instructions and things like that).

Barebones C++ without standard library?

Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?
You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.
"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.

catch (...) across shared library boundary on gcc mingw

I am a Windows developer with some knowledge of C++ EH and SEH implementation in VC++ but new to MinGW. I have built an open source application on MinGW where a dll throws a C++ exception and a client .exe catches it with two catch clauses. One catch clause is "catch (std::exception &e)" and the subsequent one is "catch(...)".
My application terminates with a call to abort despite the latter catch handler.
From searching the web, I understand that gcc exception handling uses pointer comparison to determine whether that type of a thrown exception matches a catch clause. Which can cause problems if the RTTI type descriptor instance used in the throwing and the catching modules differ. In VC++ a catch handler with ellipsis type ("catch (...)") does not do any type matching. However, for my mingw application the ellipsis catch is not determined as handling the thrown exception. I tried to set a breakpoint on ___gxx_personality_v0 to debug this but gdb says that symbol is undefined even though "nm executable.exe" built with debugging shows the symbol as defined.
Any ideas about how "catch (...)" is handled by the personality routine ? Any pointers on how to debug and/or fix this would be appreciated. I really would like the app to terminate cleanly without a popup saying "The application has requested to be terminated in an unusual way".
Thanks for any help,
--Patrick
Update: I found the solution in another post here about the same issue: Exceptions are not caught in GCC program . It turns out that both the dll and the executable were built with "-static-libgcc". Changing that to "-shared-libgcc" fixed the issue.
The quote from the gcc manual from the above post:
... if a library or main executable is supposed to throw or catch exceptions, you must link it using the G++ or GCJ driver, as appropriate for the languages used in the program, or using the option -shared-libgcc, such that it is linked with the shared libgcc.

Why is this exception not being caught across DLLs?

I have a DLL which throws an exception like so:
throw POMException(err, drvErr, errmsg);
The calling code is in a separate program, and has a try, catch block like so:
try
{
// function in separate DLL
}
catch (TXNPDO_Exception& e)
{
SR_PERFLOG_MSG(SR_PERFMASK_SELECT, "ERROR selectInStages");
TXNDBO_THROW(e);
}
Where TXNPDO_Exception is defined in an included file:
#define TXNPDO_Exception POMException
When running this in a debugger, it states that the POMException was unhandled. I even added a catch(...) clause and it still isn't handled.
I suspect that this has something to do with the Visual C++ compilation parameters, since the DLL library in question is a legacy library that is compiled separate to the program calling it. I am using Visual Studio 2003.
The DLL cpp files are compiled with the following (relevant) flags: /X /GR /Ob1 /Zi /GX /Od /MDd /LD. Other exceptions within the calling program are handled correctly.
Can anyone provide reasons why this exception isn't being propagated to the calling program?
Edit:
The DLL library was previously compiled with possible build environment and code changes that are not available to me. The previously compiled library propagates exceptions correctly.
I am compiling the client program using the same compiler, using mostly the same switches: -Od -W3 -Z7 -MDd -GX -GR -Zm800 (no /X or /Ob1 and /Z7 instead of /Zi).
I would assume that throwing exceptions across .dll boundaries could only be possible, when the various .dll and program executable are compiled against the same C++ runtime, thus sharing the same heap. I could be wrong, but this is my best guess.
Edit:
I guess I wasn't wrong.
I have finally figured out what the problem is, and in this particular case, it is nothing to do with throwing exceptions between DLLs.
The problem is occurring due to an exception handler hook being installed further up the call stack. I diagnosed this by adding try, catch(...) blocks to every level in the library until I found the point at which the exception wasn't propagated. When I commented out the exception handler hook registration code, the exception was successfully propagated.
I now have to figure out the workings of exception handler hooks, which is outside the scope of this question. Thanks to everyone who shared their answers.