I have a shared library file which I build using a Makefile. I ran into an issue where, after building the library, I'd get the dreaded GLIBCXX_ not found linker error.
This case is particularly strange. When I compile with the -g3 flag, I don't get the error. If I compile with -O2, I get the error.
So, when I compile with -O2, and run ldd against the compiled .so file, I get:
$ ldd MYLIB.so.1
./MYLIB.so.1: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./MYLIB.so.1)
linux-vdso.so.1 => (0x00007fff21e8d000)
libz.so.1 => /lib64/libz.so.1 (0x00002b2cd4c40000)
libpng12.so.0 => /usr/lib64/libpng12.so.0 (0x00002b2cd4e54000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00002b2cd5079000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b2cd529b000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b2cd54b7000)
libm.so.6 => /lib64/libm.so.6 (0x00002b2cd57b8000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b2cd5a3b000)
libc.so.6 => /lib64/libc.so.6 (0x00002b2cd5c49000)
/lib64/ld-linux-x86-64.so.2 (0x0000003891400000)
So here, for some reason, /usr/lib64/libstdc++.so.6 is looking for GLIBCXX_3.4.9, which does not exist in /usr/lib64/libstdc++.so.6, as we can see using the strings utility:
$ strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_FORCE_NEW
So, to further investigate this, I run nm against the compiled .so file, and try to find what symbols are looking for GLIBCXX_3.4.9
$ nm --demangle MYLIB.so.1 | grep GLIBCXX_3.4.9
U std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long)##GLIBCXX_3.4.9
U std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)##GLIBCXX_3.4.9
Okay, so it looks like some standard C++ ostream code requires GLIBCXX_3.4.9. Okay... but it's only this one symbol that seems to require GLIBCXX_3.4.9. Everything else correctly links with GLIBCXX_3.4:
$nm --demangle MYLIB.so.1 | grep GLIBCXX
U std::string::find(char const*, unsigned long, unsigned long) const##GLIBCXX_3.4
U std::string::compare(char const*) const##GLIBCXX_3.4
U std::string::compare(std::string const&) const##GLIBCXX_3.4
U std::logic_error::what() const##GLIBCXX_3.4
U std::runtime_error::what() const##GLIBCXX_3.4
U std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const##GLIBCXX_3.4
U std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()##GLIBCXX_3.4
... etc ...
So what could be the cause of this? Why would one particular symbol link GLIBCXX_3.4.9, but the rest don't? Even more strange - this only happens when I compile with -O2.
I'm pretty baffled by this. So, what are some likely reasons why this might occur? How does the linker/compiler chain determine which GLIBCXX version a particular symbol is found in?
This simply means you're compiling with a newer GCC than the /usr/lib64/libstdc++.so library belongs to, and you are not telling the dynamic linker how to find the right libstdc++.so
The reference to GLIBCXX_3.4.9 means you are compiling with at least GCC 4.2.0 but the system libstdc++.so is from an older version, 4.1.1 or 4.1.2.
The fact it's only a problem at -O2 is not really relevant, if you compile with a newer GCC you need to use its libstdc++.so, period. It seems that unless you compile with -O2 you don't actually get a hard dependency on the newer libstdc++.so but that's just chance, it could change if you make any small changes to your code that causes it to be optimised slightly differently.
You need to read https://gcc.gnu.org/onlinedocs/libstdc++/faq.html#faq.how_to_set_paths and https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic
Why would one particular symbol link GLIBCXX_3.4.9, but the rest don't?
Because the other symbols are unchanged between GCC 3.4 and 4.2 and the version of the symbol in your old libstdc++.so is the same as the one you linked against when building your executable.
The symbol that is not being found is something that was new in GCC 4.2, so it gets given a newer symbol version, and it can't be found in older libraries.
Related
I use the jsoncpp lib in my linux cli tool.
The CMakeLists.txt contains
find_library(LIB_JSON jsoncpp)
target_link_libraries(${PROJECT_NAME} ${LIB_JSON})
The result is
/usr/bin/c++ -rdynamic CMakeFiles/cktwagent.dir/agent_main.cpp.o -o cktwagent -ljsoncpp
When i check the binary I found:
$> ldd cktwagent
linux-vdso.so.1 (0x00007ffe4cfd1000)
libjsoncpp.so.24 => /usr/lib/libjsoncpp.so.24 (0x00007f87505bd000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f87503e0000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f875029a000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f8750280000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f87500b7000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f87506ce000)
Why ld use /usr/lib/libjsoncpp.so.24 and not the symbolic link /usr/lib/libjsoncpp.so?
Why the ld sometime resolv the linbrary link to the real library file?
$> ls -l /usr/lib/libjsoncpp.so
lrwxrwxrwx 1 root root 16 26. Sep 17:02 /usr/lib/libjsoncpp.so -> libjsoncpp.so.24
In case of /usr/lib/libstdc++.so.6 the ld use the symbolic link. When i check the path from ldd output, libstdc++.so.6 point to a symbolic link.
$> ls -l /usr/lib/libstdc++.so.6
lrwxrwxrwx 1 root root 19 9. Nov 12:43 /usr/lib/libstdc++.so.6 -> libstdc++.so.6.0.28
I like to understand this behavior. Because when i copy the binary to a different system, the link to libjsoncpp.so available. But it points to some different version.
Many thanks
Thomas
That's a version compatibility feature. By convention, API compatibility is determined by the major version number. So 6.0.28 would be is compatible with 6.1.1, but not with 5.0.1 or 7.0.1.
To allow for compatible upgrades, libstdc++.so.6.0.28 is symlinked as libstdc++.so.6. Then it can be binary-upgraded to e.g. libstdc++.so.6.0.29 without touching applications that rely on API version 6 of libstdc++.
In addition, it allows libraries with different major version numbers to coexist on the same system. Installing libstdc++.so.7.0.0 will not break apps that link against libstdc++.so.6.
Why ld use /usr/lib/libjsoncpp.so.24 and not the symbolic link /usr/lib/libjsoncpp.so?
That's because libjsoncpp has a soname of libjsoncpp.so.24. Apparently it is following the same convention of having the major version number determine compatibility.
You can achieve the same behavior with your shared lib libfoo if you link it like this:
gcc -shared -Wl,-soname,libfoo.so.<major> -o libfoo.so.<major>.<minor>
For more details refer to GCC Howto - Version numbering, sonames and symlinks.
I'm trying to use the 3rd-party lib, called DocToText, with gcc 4.4.7.
I compiled the program with:
g++ -I./doctotext/ -L./doctotext/ -Wl,-rpath=./doctotext -ldoctotext -o example test_doctotext.cpp
In the beginning, it returned libstdc++.so.6: version GLIBCXX_3.4.15 not found
I manually downloaded the newer version, and re-linked, here is the result
[root#mail]~xian# find / -name "libstdc++.so.6"
/lib64/libstdc++.so.6
/usr/lib64/libstdc++.so.6
[root#mail]~xian# strings /lib64/libstdc++.so.6 | grep GLIBCXX_3.4.15
GLIBCXX_3.4.15
[root#mail]~xian# strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX_3.4.15
GLIBCXX_3.4.15
But when I compiled again, it returned:
[root#mail]~xian# g++ -I./doctotext/ -L./doctotext/ -Wl,-rpath=./doctotext -ldoctotext -o example test_doctotext.cpp
./doctotext//libdoctotext.so: undefined reference to `std::__detail::_List_node_base::swap(std::__detail::_List_node_base&, std::__detail::_List_node_base&)#GLIBCXX_3.4.15'
./doctotext//libdoctotext.so: undefined reference to `std::__detail::_List_node_base::_M_transfer(std::__detail::_List_node_base*, std::__detail::_List_node_base*)#GLIBCXX_3.4.15'
./doctotext//libdoctotext.so: undefined reference to `std::__detail::_List_node_base::_M_unhook()#GLIBCXX_3.4.15'
./doctotext//libdoctotext.so: undefined reference to `std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)#GLIBCXX_3.4.15'
collect2: ld returned 1 exit status
I also tried
g++ -I./doctotext/ -L./doctotext/ -L/lib64/ -Wl,-rpath=./doctotext,-rpath=/lib64 -ldoctotext -lstdc++ -o example test_doctotext.cpp
, and I got the same errors (undefined reference).
libdoctotext.so indeed link to /lib64/libstdc++.so.6
[root#mail]~xian# ldd doctotext/libdoctotext.so | grep libstdc++.so.6
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f96c7ab4000)
Fortunately, I found two method to solve this problem:
use newer gcc:
I used gcc 9.1.1 (with scl), then g++ -I./doctotext/ -L./doctotext/ -Wl,-rpath=./doctotext -ldoctotext -o example test_doctotext.cpp directly works.
specify the path to libstdc++.so.6 with gcc 4.4.7:
g++ -I./doctotext/ -L./doctotext/ -Wl,-rpath=./doctotext -ldoctotext /lib64/libstdc++.so.6 -o example test_doctotext.cpp
But I'm really curious about why my gcc 4.4.7 can not link to that libstdc++ under the system default path?
Is the version of libstdc++ be tightly coupled to the version of gcc in someway?
===============================================================================
Eventually, I found gcc would use
[root#mail]/usr/lib64# find / -name "libstdc++.so"
/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/32/libstdc++.so
/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/libstdc++.so
/usr/lib/gcc/x86_64-redhat-linux/4.4.4/32/libstdc++.so
/usr/lib/gcc/x86_64-redhat-linux/4.4.4/libstdc++.so
[root#mail]/usr/lib64# ll /usr/lib/gcc/x86_64-redhat-linux/4.4.4/libstdc++.so
lrwxrwxrwx. 1 root root 37 Aug 21 15:22 /usr/lib/gcc/x86_64-redhat-linux/4.4.4/libstdc++.so -> ../../../../lib64/libstdc++.so.6.0.13
6.0.13 doesn't have GLIBCXX_3.4.15
I re-link this to libstdc++.so.6.0.17, problem solved
Every GCC release is accompanied by its very own libstdc++ release.
The C++ standard library (and support libraries like libsupc++) often rely on specific implementation details in the compiler, including bugs, and specific changes in behaviour due to defect reports etc.
Sometimes even a new GCC release also needs a matching binutils (linker) release, as the way the code is generated changed to use a specific feature only available in the newer linker.
You can explicitly link it to the system libstdc++ by passing that path to the compiler/linker, but I don't recommend it, as the ABI may have changed in an incompatible way.
Code
Here is the program that gives the segfault.
#include <iostream>
#include <vector>
#include <memory>
int main()
{
std::cout << "Hello World" << std::endl;
std::vector<std::shared_ptr<int>> y {};
std::cout << "Hello World" << std::endl;
}
Of course, there is absolutely nothing wrong in the program itself. The root cause of the segfault depends on the environment in which its built and ran.
Background
We, at Amazon, use a build system which builds and deploys the binaries (lib and bin) in an almost machine independent way. For our case, that basically means it deploys the executable (built from the above program) into $project_dir/build/bin/ and almost all its dependencies (i.e the shared libraries) into $project_dir/build/lib/. Why I used the phrase "almost" is because for shared libraries such libc.so, libm.so, ld-linux-x86-64.so.2 and possibly few others, the executable picks from the system (i.e from /lib64 ). Note that it is supposed to pick libstdc++ from $project_dir/build/lib though.
Now I run it as follows:
$ LD_LIBRARY_PATH=$project_dir/build/lib ./build/bin/run
segmentation fault
However if I run it, without setting the LD_LIBRARY_PATH. It runs fine.
Diagnostics
1. ldd
Here are ldd informations for both cases (please note that I've edited the output to mention the full version of the libraries wherever there is difference)
$ LD_LIBRARY_PATH=$project_dir/build/lib ldd ./build/bin/run
linux-vdso.so.1 => (0x00007ffce19ca000)
libstdc++.so.6 => $project_dir/build/lib/libstdc++.so.6.0.20
libgcc_s.so.1 => $project_dir/build/lib/libgcc_s.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2 (0x0000562ec51bc000)
and without LD_LIBRARY_PATH:
$ ldd ./build/bin/run
linux-vdso.so.1 => (0x00007fffcedde000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6.0.16
libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2 (0x0000560caff38000)
2. gdb when it segfaults
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.62.al12.x86_64
(gdb) bt
#0 0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff7df0c55 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7b1dc41 in std::locale::_S_initialize() () from $project_dir/build/lib/libstdc++.so.6
#3 0x00007ffff7b1dc85 in std::locale::locale() () from $project_dir/build/lib/libstdc++.so.6
#4 0x00007ffff7b1a574 in std::ios_base::Init::Init() () from $project_dir/build/lib/libstdc++.so.6
#5 0x0000000000400fde in _GLOBAL__sub_I_main () at $project_dir/build/gcc-4.9.4/include/c++/4.9.4/iostream:74
#6 0x00000000004012ed in __libc_csu_init ()
#7 0x00007ffff7518cb0 in __libc_start_main () from /lib64/libc.so.6
#8 0x0000000000401021 in _start ()
(gdb)
3. LD_DEBUG=all
I also tried to see the linker information by enabling LD_DEBUG=all for the segfault case. I found something suspicious, as it searches for pthread_once symbol, and when it unable to find this, it gives segfault (that is my interpretation of the following output snippet BTW):
initialize program: $project_dir/build/bin/run
symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/bin/run [0]
symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
binding file $project_dir/build/bin/run [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base4InitC1Ev' [GLIBCXX_3.4]
symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/bin/run [0]
symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
binding file $project_dir/build/lib/libstdc++.so.6 [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt6localeC1Ev' [GLIBCXX_3.4]
symbol=pthread_once; lookup in file=$project_dir/build/bin/run [0]
symbol=pthread_once; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
symbol=pthread_once; lookup in file=$project_dir/build/lib/libgcc_s.so.1 [0]
symbol=pthread_once; lookup in file=/lib64/libc.so.6 [0]
symbol=pthread_once; lookup in file=/lib64/libm.so.6 [0]
symbol=pthread_once; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
But I dont see any pthread_once for the case when it runs successfully!
Questions
I know that its very difficult to debug like this and probably I've not given a lot of informations about the environments and all. But still, my question is: what could be the possible root-cause for this segfault? How to debug further and find that? Once I find the issue, fix would be easy.
Compiler and Platform
I'm using GCC 4.9 on RHEL5.
Experiments
E#1
If I comment the following line:
std::vector<std::shared_ptr<int>> y {};
It compiles and runs fine!
E#2
I just included the following header to my program:
#include <boost/filesystem.hpp>
and linked accordingly. Now it works without any segfault. So it seems by having a dependency on libboost_system.so.1.53.0., some requirements are met, or the problem is circumvented!
E#3
Since I saw it working when I made the executable to be linked against libboost_system.so.1.53.0, so I did the following things step by step.
Instead of using #include <boost/filesystem.hpp> in the code itself, I use the original code and ran it by preloading libboost_system.so using LD_PRELOAD as follows:
$ LD_PRELOAD=$project_dir/build/lib/libboost_system.so $project_dir/build/bin/run
and it ran successfully!
Next I did ldd on the libboost_system.so which gave a list of libs, two of which were:
/lib64/librt.so.1
/lib64/libpthread.so.0
So instead of preloading libboost_system, I preload librt and libpthread separately:
$ LD_PRELOAD=/lib64/librt.so.1 $project_dir/build/bin/run
$ LD_PRELOAD=/lib64/libpthread.so.0 $project_dir/build/bin/run
In both cases, it ran successfully.
Now my conclusion is that by loading either librt or libpthread (or both ), some requirements are met or the problem is circumvented! I still dont know the root cause of the issue, though.
Compilation and Linking Options
Since the build system is complex and there are lots of options which are there by default. So I tried to explicitly add -lpthread using CMake's set command, then it worked, as we have already seen that by preloading libpthread it works!
In order to see the build difference between these two cases (when-it-works and when-it-gives-segfault), I built it in verbose mode by passing -v to GCC, to see the compilation stages and the options it actually passes to cc1plus (compiler) and collect2 (linker).
(Note that paths has been edited for brevity, using dollar-sign and dummy paths.)
$/gcc-4.9.4/cc1plus -quiet -v -I /a/include -I /b/include -iprefix
$/gcc-4.9.4/ -MMD main.cpp.d -MF main.cpp.o.d -MT main.cpp.o
-D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS=64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $/lab/main.cpp -quiet -dumpbase main.cpp -msse -mfpmath=sse -march=core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std=gnu++1y -version -fdiagnostics-color=auto -ftemplate-depth=128 -fno-operator-names -o /tmp/ccxfkRyd.s
Irrespective of whether it works or not, the command-line arguments to cc1plus are exactly the same. No difference at all. That does not seem to be very helpful.
The difference, however, is at the linking time. Here is what I see, for the case when it works:
$/gcc-4.9.4/collect2 -plugin $/gcc-4.9.4/liblto_plugin.so
-plugin-opt=$/gcc-4.9.4/lto-wrapper -plugin-opt=-fresolution=/tmp/cchl8RtI.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o run /usr/lib/../lib64/crt1.o
/usr/lib/../lib64/crti.o $/gcc-4.9.4/crtbegin.o -L/a/lib -L/b/lib
-L/c/lib
-lpthread --as-needed main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_math_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath /a/lib:/b/lib:/c/lib: -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $/gcc-4.9.4/crtend.o /usr/lib/../lib64/crtn.o
As you can see, -lpthread is mentioned twice! The first -lpthread (which is followed by --as-needed) is missing for the case when it gives segfault. That is the only difference between these two cases.
Output of nm -C in both cases
Interestingly, the output of nm -C in both cases is identical (if you ignore the integer values in the first columns).
0000000000402580 d _DYNAMIC
0000000000402798 d _GLOBAL_OFFSET_TABLE_
0000000000401000 t _GLOBAL__sub_I_main
0000000000401358 R _IO_stdin_used
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U _Unwind_Resume
0000000000401150 W std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()
0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector()
0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector()
0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector()
0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector()
U std::ios_base::Init::Init()
U std::ios_base::Init::~Init()
0000000000402880 B std::cout
U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
0000000000402841 b std::__ioinit
U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
U operator delete(void*)
U operator new(unsigned long)
0000000000401510 r __FRAME_END__
0000000000402818 d __JCR_END__
0000000000402818 d __JCR_LIST__
0000000000402820 d __TMC_END__
0000000000402820 d __TMC_LIST__
0000000000402838 A __bss_start
U __cxa_atexit
0000000000402808 D __data_start
0000000000401100 t __do_global_dtors_aux
0000000000402820 t __do_global_dtors_aux_fini_array_entry
0000000000402810 d __dso_handle
0000000000402828 t __frame_dummy_init_array_entry
w __gmon_start__
U __gxx_personality_v0
0000000000402838 t __init_array_end
0000000000402828 t __init_array_start
00000000004012b0 T __libc_csu_fini
00000000004012c0 T __libc_csu_init
U __libc_start_main
w __pthread_key_create
0000000000402838 A _edata
0000000000402990 A _end
000000000040134c T _fini
0000000000400e68 T _init
0000000000401028 T _start
0000000000401054 t call_gmon_start
0000000000402840 b completed.6661
0000000000402808 W data_start
0000000000401080 t deregister_tm_clones
0000000000401120 t frame_dummy
0000000000400f40 T main
00000000004010c0 t register_tm_clones
Given the point of crash, and the fact that preloading libpthread seems to fix it, I believe that the execution of the two cases diverges at locale_init.cc:315. Here is an extract of the code:
void
locale::_S_initialize()
{
#ifdef __GTHREADS
if (__gthread_active_p())
__gthread_once(&_S_once, _S_initialize_once);
#endif
if (!_S_classic)
_S_initialize_once();
}
__gthread_active_p() returns true if your program is linked against pthread, specifically it checks if pthread_key_create is available. On my system, this symbol is defined in "/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h" as static inline, hence it is a potential source of ODR violation.
Notice that LD_PRELOAD=libpthread,so will always cause __gthread_active_p() to return true.
__gthread_once is another inlined symbol that should always forward to pthread_once.
It's hard to guess what's going on without debugging, but I suspect that you are hitting the true branch of __gthread_active_p() even when it shouldn't, and the program then crashes because there is no pthread_once to call.
EDIT:
So I did some experiments, the only way I see to get a crash in std::locale::_S_initialize is if __gthread_active_p returns true, but pthread_once is not linked in.
libstdc++ does not link directly against pthread, but it imports half of pthread_xx as weak objects, which means they can be undefined and not cause a linker error.
Obviously linking pthread will make the crash disappear, but if I am right, the main issue is that your libstdc++ thinks that it is inside a multi-threaded executable even if we did not link pthread in.
Now, __gthread_active_p uses __pthread_key_create to decide if we have threads or no. This is defined in your executable as a weak object (can be nullptr and still be fine). I am 99% sure that the symbol is there because of shared_ptr (remove it and check nm again to be sure).
So, somehow __pthread_key_create gets bound to a valid address, maybe because of that last -lpthread in your linker flags.
You can verify this theory by putting a breakpoint at locale_init.cc:315 and checking which branch you take.
EDIT2:
Summary of the comments, the issue is only reproducible if we have all of the following:
Use ld.gold instead of ld.bfd
Use --as-needed
Forcing a weak definition of __pthread_key_create, in this case via instantiation of std::shared_ptr.
Not linking to pthread, or linking pthread after --as-needed.
To answer the questions in the comments:
Why does it use gold by default?
By default it uses /usr/bin/ld, which on most distro is a symlink to either /usr/bin/ld.bfd or /usr/bin/ld.gold. Such default can be manipulated using update-alternatives. I am not sure why in your case it is ld.gold, as far as I understand RHEL5 ships with ld.bfd as default.
And why does gold not add pthread.so dependency to the binary if it is needed?
Because the definition of what is needed is somehow shady. man ld says (emphasis mine):
--as-needed
--no-as-needed
This option affects ELF DT_NEEDED tags for dynamic libraries mentioned on the command line after the --as-needed option.
Normally the linker will add a DT_NEEDED
tag for each dynamic library mentioned on the command line, regardless of whether the library is actually needed or not.
--as-needed causes a DT_NEEDED tag to
only be emitted for a library that at that point in the link satisfies a non-weak undefined symbol reference from a regular
object file or, if the library is not
found in the DT_NEEDED lists of other needed libraries, a non-weak undefined symbol reference from another needed dynamic
library. Object files or libraries
appearing on the command line after the library in question do not affect whether the library is seen as needed. This is similar
to the rules for extraction of
object files from archives. --no-as-needed restores the default behaviour.
Now, according to this bug report, gold is honoring the "non weak undefined symbol" part, while ld.bfd sees weak symbols as needed. TBH I do not have a full understanding on this, and there is some discussion on that link as to whether this is to be considered a ld.gold bug, or a libstdc++ bug.
Why do I need to mention -pthread and -lpthread both? (-pthread is
passed by default by our build system, and I've pass -lpthread to make
it work with gold is used).
-pthread and -lpthread do different things (see pthread vs lpthread). It is my understanding that the former should imply the latter.
Regardless, you can probably pass -lpthread only once, but you need to do it before --as-needed, or use --no-as-needed after the last library and before -lpthread.
It is also worth mentioning that I was not able to reproduce this issue on my system (GCC 7.2), even using the gold linker.
So I suspect that it has been fixed in a more recent version libstdc++, which might also explain why it does not segfault if you use the system standard library.
This is likely a problem caused by subtle mismatches between libstdc++ ABIs. GCC 4.9 is not the system compiler on Red Hat Enterprise Linux 5, so it's not quite clear what you are using there (DTS 3?).
The locale implementation is known to be quite sensitive to ABI mismatches. See this thread on the gcc-help list:
Binary compatibility between an old static libstdc++ and a new dynamic one
plus follow-ups in the next month
Your best bet is to figure out which bits of libstdc++ where linked where, and somehow achieve consistency (either by hiding symbols, or recompiling things so that they are compatible).
It may also be useful to investigate the hybrid linkage model used for libstdc++ in Red Hat's Developer Toolset (where newer bits are linked statically, but the bulk of the C++ standard library uses the existing system DSO), but the system libstdc++ in Red hat Enterprise Linux 5 might be too old for that if you need support for current language features.
I have some code using the SystemC library which compiles fine when I'm physically at the machine, but throws undefined references when I'm ssh'ing in.
g++ -Wno-deprecated -O0 -g3 -I/path/to/include socex2.cpp -L/path/to/lib -lsystemc
/tmp/ccCNdiMA.o: In function `sc_dt::sc_uint_base::print(std::ostream&) const':
/path/to/include/sysc/datatypes/int/sc_uint_base.h:844: undefined reference to `sc_dt::sc_uint_base::to_string[abi:cxx11](sc_dt::sc_numrep, bool) const'
collect2: error: ld returned 1 exit status
At first I thought it was a problem with LD_LIBRARY_PATH, set in ~/.bashrc to /path/to/lib. I source ~/.bashrc in ~/.bash_profile for non-interactive sessions such as ssh.
To verify, here's the relevant bits of /usr/bin/env:
TERM=xterm
SHELL=/bin/bash
SSH_CLIENT=xx.xx.xx.xx 56176 22
LD_LIBRARY_PATH=/path/to/lib
SSH_CONNECTION=xx.xx.xx.xx 56176 yy.yy.yy.yy 22
_=/usr/bin/env
Why won't my program link? The headers and libraries I'm using are exactly the same and in the exact same places.
P.S.
I don't have admin access on these machines
gcc is 5.4.0
OS is Ubuntu 16.04
Dependent libraries:
$ ldd /path/to/lib/libsystemc.so
linux-vdso.so.1 => (0x00007ffe29d36000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9b85f5000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb9b8273000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb9b7f69000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb9b7ba0000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb9b798a000)
/lib64/ld-linux-x86-64.so.2 (0x000056093a23e000)
...to_string[abi:cxx11] ...
One of two things...
First, GCC and Clang are being mixed and matched. If you were compiling with Clang, this would be a likely suspect because of GCC5 and the C++11 ABI and LLVM Issue 23529: Add support for gcc's attribute abi_tag (needed for compatibility with gcc 5's libstdc++).
Second, to_string is C++11, so you need either -std=c++11 or -std=gnu++11. C++11 is the likely candidate if all other things are equal. It also gets you the new ABI unless you -D_GLIBCXX_USE_CXX11_ABI=0.
You could still have problems with dependent library configurations, and they could be surfacing in your question.
For a few days we are dealing with very strange problem.
I can't understand how it even happens - when a third-party (MATLAB) program uses our shared library, it somehow overrides some of our symbols (boost, to be precise) with it's own. Those symbols are statically linked and (!!) local.
Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).
So, here is the magic:
We have no library dependencies, ldd:
linux-vdso.so.1 => (0x00007fff4abff000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000)
libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000)
libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000)
libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000)
librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
No Cxx symbols (our public symbols are POC C for binary compatibility) are exported from our library, nm:
nm -g --defined-only libmysharedlib.so
addr1 T OurCSymbol1
addr2 T OurCSymbol2
addr3 T OurCSymbol3
...
Still, it uses their boost. HOW? Stacktrace (paths cut):
[ 0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009
[ 1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161
[ 2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773
[ 3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100
[ 4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991
[ 5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728
[ 6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073
[ 7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758
[ 8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245
[ 9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183
[ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307
It's known, that MATLAB does dlopen with RTLD_NOW flag only.
People, think with me please.
Now i'm desperate not to even fix this, but to simply understand ld&elf behavior.
edit:
Small additional question: how i understood, without special linker options, symbols in linux .so libraries are never linked by address? So even statically linked local symbols are resolved in runtime?
Check out the -Bsymbolic option for ld.
If -Bsymbolic is specified, then at the time of creating a shared
object ld will attempt to bind references to global symbols to definitions
within the shared library. The default is to defer binding to runtime.
This may be clearer with an example.
Say example.o contains a reference to a global function defined in
global.o,
$ nm example.o | grep ' U'
U _GLOBAL_OFFSET_TABLE_
U globalfn
$ nm global.o | grep ' T'
00000000 T globalfn
and two shared objects, normal.so and symbolic.so, are built as
follows:
$ cc -fPIC -c example.c
$ cc -c global.c
$ rm -f archive.a; ar cr archive.a global.o
$ ld -shared -o normal.so example.o archive.a
$ ld -Bsymbolic -shared -o symbolic.so example.o archive.a
Disassembling the code for normal.so shows that the call to
globalfn is actually going through the procedure linkage table, and
thus the final destination of the call is determined at runtime.
$ objdump --disassemble normal.so
...snip...
00000194 <example>:
...snip...
1a6: e8 d9 ff ff ff call 184 <globalfn#plt>
...snip...
$ readelf -r normal.so
Relocation section '.rel.plt' at offset 0x16c contains 1 entries:
Offset Info Type Sym.Value Sym. Name
00001244 00000207 R_386_JUMP_SLOT 000001b8 globalfn
Whereas in symbolic.so, the call always invokes the definition of
globalfn within the shared object.
$ objdump --disassemble symbolic.so
...snip...
0000016c <shared>:
...snip...
17e: e8 0d 00 00 00 call 190 <globalfn>
...snip...
$ readelf -r symbolic.so
There are no relocations in this file.
Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).
You are invoking undefined behavior, which is a "Doctor, it hurts when I do this" kind of situation. The Matlab executable already contains external functions for class boost::re_detail::perl_matcher< elided >. When Matlab loads your shared library the dynamic linker sees that your shared library defines those exact same symbols in a way that conflicts with the existing definitions. Undefined behavior.
The solution is to build a version of your library for use with Matlab that uses the same version of Boost as does Matlab.