Currently I'm running some multi-threaded code which all compiles with no errors or warnings and I get this error when I execute the code:
relocation error: /lib/x86_64-linux-gnu/libgcc_s.so.1:
1thread_mutex_locXãƨ+�����Ȩ+ ������ƨ+�&쏭Ũ�Ȩ+e
What is a relocation error?
The relocation is process of adopting some offsets in the code to the actual memory layout.
Relocations (places which will be edited by relocation process and the description of each relocation) are generated by compiler, e.g. for TLS variables, for dynamic library calls, for PIC/PIE code. Relocation description is stored in the binary file (e.g. in ELF format in Linux).
Relocations are partially done in linking step, by ld linker program in linux; other linkers in other OSes.
But there are some relocations which can't be done in offline (before starting program). Such relocations are needed to use ASLR (address space layout randomization), to load dynamic libraries. So some of them are done just before starting a program, by the program interpreter, (ld.so in linux), which is also called runtime linker. It will load your program and its dynamic libraries into memory and will do relocations.
Third place where relocations are done: is a call to dlopen() (in libdl.so in unix). It is library to dynamically load dynamic libraries; and because dynamic libraries has relocations, this library should do them too.
The error message is from some linker, and if you see this after starting a program, this is second (ld.so) or third case (libdl).
I can't find exact place where this message is generated, but it is possible due
memory or on-disk data corruption (non-ecc memory or other hardware bug), which made some data wrong. Do a reboot; filesystem and md5sums checks; reinstalling of packages which are used (glibc; libgcc); recompile your application; replug you memory, make memory frequency less.
some undefined symbol was used. Try to set environment variable LD_BIND_NOW (if you are on glibc or derivative) to non-null.
the program corrupted its memory itself. e.g. using the Stack Overflow, or Random Pointer Walk, or something like. Try to use a valgrind (if you are on intel).
synchronization error which allows you program to break itself memory. Use valgrind --tool=helgrind (if you are on intel and have a lot of time to wait)
Related
I have an unresolved symbol error when trying to compile my program which complains that it cannot find __dso_handle. Which library is this function usually defined in?
Does the following result from nm on libstdc++.so.6 mean it contains that?
I tried to link against it but the error still occurs.
nm libstdc++.so.6 | grep dso
00000000002fc480 d __dso_handle
__dso_handle is a "guard" that is used to identify dynamic shared objects during global destruction.
Realistically, you should stop reading here. If you're trying to defeat object identification by messing with __dso_handle, something is likely very wrong.
However, since you asked where it is defined: the answer is complex. To surface the location of its definition (for GCC), use iostream in a C++ file, and, after that, do extern int __dso_handle;. That should surface the location of the declaration due to a type conflict (see this forum thread for a source).
Sometimes, it is defined manually.
Sometimes, it is defined/supplied by the "runtime" installed by the compiler (in practice, the CRT is usually just a bunch of binary header/entry-point-management code, and some exit guards/handlers). In GCC (not sure if other compilers support this; if so, it'll be in their sources):
Main definition
Testing __dso_handle replacement/tracker example 1
Testing __dso_handle replacement/tracker example 2
Often, it is defined in the stdlib:
Android
BSD
Further reading:
Subtle bugs caused by __dso_handle being unreachable in some compilers
I ran into this problem. Here are the conditions which seem to reliably generate the trouble:
g++ linking without the C/C++ standard library: -nostdlib (typical small embedded scenario).
Defining a statically allocated standard library object; specific to my case is std::vector. Previously this was std::array statically allocated without any problems. Apparently not all std:: statically allocated objects will cause the problem.
Note that I am not using a shared library of any type.
GCC/ARM cross compiler is in use.
If this is your use case then merely add the command line option to your compile/link command line: -fno-use-cxa-atexit
Here is a very good link to the __dso_handle usage as 'handle to dynamic shared object'.
There appears to be a typo in the page, but I have no idea who to contact to confirm:
After you have called the objects' constructor destructors GCC automatically calls the function ...
I think this should read "Once all destructors have been called GCC calls the function" ...
One way to confirm this would be to implement the __cxa_atexit function as mentioned and then single step the program and see where it gets called. I'll try that one of these days, but not right now.
Adding to #natersoz's answer-
For me, using -Wabi-tag -D_GLIBCXX_USE_CXX11_ABI=0 alongside -fno-use-cxa-atexit helped compile an old lib. A telltale is if the C++ functions in the error message have std::__cxx11 in them, due to an ABI change.
I need to build a portable shared object, which is a plugin for another software on Linux. I did some amount of reading on the subject, came down to the conclusion, that I should build a sysrooted gcc (gcc 5.4.0 if it matters) with a decently old glibc (to provide compatibility with older systems), link with -static-libstdc++ and -static-libgcc thus arriving to a point where I have something that only depends on the hosts glibc and some other minor stuff which will always be present.
Now, I did all that and now I am experiencing a weird crash - segmentation fault happens in a place where the code calls std::thread, and gdb actually shows that the stack frame is inside libstdc++.so.6 (where is shouldn't be, ldd of my shared object also does not list libstdc++.so). The top of the stack at the crash is:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff79075e3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 # THIS SHOULD NOT BE HERE RIGHT?
#2 0x00007ffff5a25a5c in std::thread::thread<void (ReferenceAnalytics::*)(std::timed_mutex&), ReferenceAnalytics*&, std::reference_wrapper<std::timed_mutex> >
(this=0x7fffffffcf40, __f=
#0x7fffffffcf60: (void (ReferenceAnalytics::*)(ReferenceAnalytics * const, std::timed_mutex &)) 0x7ffff5a1750c <ReferenceAnalytics::WorkerThreadMethod(std::timed_mutex&)>)
at /home/developer/Toolchains/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/include/c++/5.4.0/thread:137 # Looks like my toolchain
So, I did some reading, and then using nm discovered that my shared object has all the std::thread stuff like ctor, dtor, swap, .... defined as weak symbols (which I assume causes a collision if the host that loads the plugin uses dynamic libstdc++ and then my calls are routed there and all hell breaks loose, is this right?).
My further attempts of googling and reading did not give me an answer to how can I control this as in force the std::thread stuff to be resolved to the static libstdc++ in my sysrooted gcc?
More over, I made a small executable that just does dlopen on my shared object and then calls a method which internally constructs the thread - if the executable is also built with -static-libstdc++ all is well, if not, the crash happens. So I assume my theory about the weak symbol for std::thread being resolved to the hosts libstdc++ is correct, but how to solve this?
If you statically link a DSO against libstdc++ without hiding the libstdc++ symbols, and the main program is linked against libstdc++ as well, then the symbol definitions in the main program will interpose/preempt the definitions in the DSO when it is opened with dlopen.
However, because the main program is not linked against libpthread, the the system libstdc++ DSO in the process image saw that the libpthread symbols were unavailable (null), and thus disabled thread support. However, your DSO needs thread support, but can't get it from the system libstdc++.
As an immediate workaround, you can hide all the statically linked libstdc++ symbols in the DSO. Then no interposition will take place, and your DSO will actually use the libstdc++ copy in the DSO itself, which has already established that there should not be any thread support in the process.
But this will likely not solve all of your problems because late loading of libpthread via dlopen has its problems. We fixed one bug here:
Segfault after a binary without pthread dlopen()s a library linked with pthread
But your distribution may not have that fix, and I expect there will be other issues, one of them being: The second, statically linked copy of libstdc++ is actually needed here because the system libstdc++ has been loaded without thread support (because libpthread was not loaded when its symbols were bound, causing the crash you observed), so you cannot use it for creating threads. It also has activated optimizations which make the library not thread safe (avoid atomic instructions and things like that).
I have an application which uses a class Foo from an .so shared library. I've come across a problem where at runtime it prints
<appname>: symbol lookup error: <appname>: undefined symbol: <mangled_Foo_symbol_name>
Now, it turned out that the unmangled symbol was for the constructor of the class Foo, and the problem was simply that an old version of the library was loaded, which didn't contain Foo yet.
My question isn't about resolving the error (that's obviously to use the correct library), but why it appears at runtime instead of at time of load / startup.
The line of code causing the error just instantiates an object of class Foo, so I'm not using anything like dlopen here, at least not explicitly / to my knowledge.
In contrast, if I remove the whole library from the load search path, I get this error at startup:
<appname>: error while loading shared libraries: libname.so.2: cannot open shared object file: No such file or directory
When the wrong version of gcc / libstdc++ is on the load path, an error also appears at starup:
<appname>: /path/to/gcc-4.8.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by <appname>)
This "fail fast" behavior is much more desirable, I don't want to run my application for quite awhile first, until I finally realize it's using the wrong library.
What causes the load error to appear at runtime and how can I make it appear immediately?
From the man page of ld.so:
ENVIRONMENT
LD_BIND_NOW (libc5; glibc since 2.1.1) If set to a nonempty string, causes the dynamic linker to resolve all symbols at program startup instead of deferring function call resolution to the point when they are first referenced. This is useful when using a debugger.
LD_WARN (ELF only)(glibc since 2.1.3) If set to a nonempty string, warn about unresolved symbols.
I think you can not statically link .so library. If you want to avoid load/run time errors you have to use all static libraries (.a). If you do not have static version of library and source then try to find some statifier. After googling I find few statifiers but do not know how do they work so leaving that part up to you.
Trying to track down a segfault somewhere in MPI, I got this error:
./mpitest: symbol lookup error: /usr/lib64/libstdc++.so.6: bàþ;# BC_
-------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 8729 on ...
First, I'm used to getting lookup errors when the process is loaded if the library path is wrong. But those all happen before the process starts executing. This happened in the middle of the output from the test. Shouldn't all symbols be resolved by the runtime loader before the process starts?
Second, that symbol looks like garbage. It's certainly not a normal mangled C++ symbol.
Is it possible for memory corruptions (since I am tracking a segfault, it's likely there's something like that going on) to corrupt symbols like this?
This was compiled with icpc 12.0.3 20110309 on a Linux 2.6.18-194.32.1.el5 x86_64 machine.
OpenMPI loads plugins as dynamic shared object at runtime when MPI_INIT is called. See this FAQ. Therefore symbol lookup happens at that time. So it looks to me that your OpenMPI's libmpi_cxx.so was built against a different libstdc++ than what is available or found at runtime. on the system.
You can either rebuild OpenMPI, or if the correct libstdc++ is somewhere on your system (not /usr/lib64/libstdc++.so.6), you can adjust your LD_LIBRARY_PATH. Also, try setting LD_DEBUG=files to see if you are in fact load 2 different libstdc++'s.
I am currently writing a small library, and I want to check it for leaks (among other things); however, for some reason, gdb is not loading the library symbols. I have read many other posts on here (and various other places on the internet) about this, however, I cannot seem to find a solution. Here is what is going on:
I am compiling the shared library with the following flags (these are included in both the final shared library as well as all object files):
CFLAGS=-Wall -O0 -g -fPIC
Likewise, I am compiling the binary memtest (the client application for the library) to check for memory leaks and such with these flags
CFLAGS=-Wall -O0 -g
Now, I inserted a NULL pointer into the library to test if I could trace through it and "debug" the pointer (i.e. it's making it crash). So I try to run it through gdb, but it's a no go. The output of info sharedlibrary is the same for both the executable and the core:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
... Some libraries I am not worried about debugging...
0x00d37340 0x00d423a4 Yes (*) /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0 <--- My lib
.... and some more....
(*): Shared library is missing debugging information.
As you can see, it's not loading the debug information. I am uncertain as to why this is. I have built and linked everything with the -g flag, and I even try -ggdb and -g3 but nothing seems to work properly. When I load in a core dump, this is what I see:
...some libs...
Reading symbols from /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0...done.
Loaded symbols for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
...some more libs...
Notice how my library does not give a (no debugging symbols found) error - anyone have any ideas why? As I said before, I am unable to debug this through running the program gdb ./memtest or through debugging the core file.
Thanks for your help.
EDIT It may also be important to note, that (if you didn't realize by path) this library is a local shared library (i.e. I'm using -Wl,-rpath to link/load it)
EDIT2 It seems my version of GDB was out-of-date. Now, I have updated to the latest version from the CVS server (I have also tried latest release version 7.2) and it can "load" symbols. My info sharedlibrary now reads this:
0x00e418b0 0x00e4be74 Yes /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
However, I am still unable to step through any functions (in the shared library) - anyone have any ideas?
EDIT3 I have also tried to step through linking against a static library (libMyLIB.a) but it still isn't working. My OS is CentOS 5.6; does anyone know of any issues with this system? Also, just another confirmation that my symbols are being loaded (it just can't step through any shared lib function for some reason)
(gdb) sharedlibrary MyLIB
Symbols already loaded for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
I found the reason this wasn't working: I was calling an old function call to initialize a pointer in my test executable. Since the object was never being created, I could never step into the library. Once I updated the function call, all worked well.
That said, if anyone experiences similar issues while all symbols appear to be loaded, be sure to check that all pointers are initialized properly even if they have the correct type.