Why does this linker warning and segment fault happen? - c++

I recently upgraded some external library version from librdkafka 1.3.0 to librdkafka 1.6.1.
After building the external library, it was linked as a shared object.
Then the following warning occurred when my program was linked.
/opt/rh/devtoolset-7/root/usr/libexec/gcc/x86_64-redhat-linux/7/ld:
Warning: type of symbol `mtx_lock' changed from 2 to 1
in ../externals/synapfilter/lib/libsnf.a(memoryUtil.cpp.o)
Also a segment fault occurred during program execution.
The output of gdb is as follows.
Program terminated with signal SIGSEGV, Segmentation fault.
b#0 0x0000000000f27a80 in mtx_lock ()
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.5-7.el6_0.x86_64 cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 libgcc-4.4.7-17.el6.x86_64 libicu-4.2.1-14.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 libstdc++-4.4.7-17.el6.x86_64 libzstd-1.4.5-3.el6.x86_64 lz4-r131-1.el6.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000f27a80 in mtx_lock ()
#1 0x00007f59479a38cc in rd_kafka_global_cnt_incr () at rdkafka.c:182
#2 rd_kafka_new (type=type#entry=RD_KAFKA_PRODUCER, app_conf=app_conf#entry=0x2531870, errstr=errstr#entry=0x7ffd71c7c7d0 <incomplete sequence \350>,
errstr_size=errstr_size#entry=512) at rdkafka.c:2092
I found that the name(mtx_lock) in the two external libraries used was duplicated.
It was used as a global variable in one object file of the libsnf.a.
$ objdump -t memoryUtil.cpp.o | grep mtx_lock
0000000000000000 g O .bss 0000000000000028 mtx_lock
Also the name was used as a function in one object file of the librdkafka.a.
$ objdump -t tinycthread.o | grep mtx_lock
0000000000000090 g F .text 0000000000000016 mtx_lock
I wonder why this is happening and how to fix it.
In my makefile, I linked a libsnf.a as a static library and librdkafka.so as a dynamic library.

I wonder why this is happening
You have two separate object files: memoryUtil.cpp.o and tinycthread.o, defining the same symbol: mtx_lock. One of them defines it as a function, the other as a variable.
Normally this should result in "multiply defined" symbol error at link time, but you get a warning instead. I am not sure why; perhaps one of these symbol definitions is weak.
(In general, you should never use objdump to look at ELF symbols -- use readelf -Ws instead.)
Your program proceeds to call mtx_lock(), but gets a data variable instead, and crashes.
and how to fix it.
Since these libraries are open source, the easiest fix is to rename one (or both) of the variables, and rebuild.
If you don't want to rebuild, you could use objcopy --redefine-sym ... to achieve the same result.
Update:
The mtx_lock() function is part of the C11 standard, which makes its use as a variable in libsnf highly problematic.

Related

"undefined reference to __dso_handle" while linking static library with -nostdlib [duplicate]

I have an unresolved symbol error when trying to compile my program which complains that it cannot find __dso_handle. Which library is this function usually defined in?
Does the following result from nm on libstdc++.so.6 mean it contains that?
I tried to link against it but the error still occurs.
nm libstdc++.so.6 | grep dso
00000000002fc480 d __dso_handle
__dso_handle is a "guard" that is used to identify dynamic shared objects during global destruction.
Realistically, you should stop reading here. If you're trying to defeat object identification by messing with __dso_handle, something is likely very wrong.
However, since you asked where it is defined: the answer is complex. To surface the location of its definition (for GCC), use iostream in a C++ file, and, after that, do extern int __dso_handle;. That should surface the location of the declaration due to a type conflict (see this forum thread for a source).
Sometimes, it is defined manually.
Sometimes, it is defined/supplied by the "runtime" installed by the compiler (in practice, the CRT is usually just a bunch of binary header/entry-point-management code, and some exit guards/handlers). In GCC (not sure if other compilers support this; if so, it'll be in their sources):
Main definition
Testing __dso_handle replacement/tracker example 1
Testing __dso_handle replacement/tracker example 2
Often, it is defined in the stdlib:
Android
BSD
Further reading:
Subtle bugs caused by __dso_handle being unreachable in some compilers
I ran into this problem. Here are the conditions which seem to reliably generate the trouble:
g++ linking without the C/C++ standard library: -nostdlib (typical small embedded scenario).
Defining a statically allocated standard library object; specific to my case is std::vector. Previously this was std::array statically allocated without any problems. Apparently not all std:: statically allocated objects will cause the problem.
Note that I am not using a shared library of any type.
GCC/ARM cross compiler is in use.
If this is your use case then merely add the command line option to your compile/link command line: -fno-use-cxa-atexit
Here is a very good link to the __dso_handle usage as 'handle to dynamic shared object'.
There appears to be a typo in the page, but I have no idea who to contact to confirm:
After you have called the objects' constructor destructors GCC automatically calls the function ...
I think this should read "Once all destructors have been called GCC calls the function" ...
One way to confirm this would be to implement the __cxa_atexit function as mentioned and then single step the program and see where it gets called. I'll try that one of these days, but not right now.
Adding to #natersoz's answer-
For me, using -Wabi-tag -D_GLIBCXX_USE_CXX11_ABI=0 alongside -fno-use-cxa-atexit helped compile an old lib. A telltale is if the C++ functions in the error message have std::__cxx11 in them, due to an ABI change.

Call ambiguous function in gdb with stripped symbols

I'm trying to call a function with
(gdb) call fun()
in a 3rd party library libFoo which I get in compiled form with stripped symbols.
(gdb) info function ^fun$
Non-debugging symbols:
0x00007ffff6d7e3b0 fun
Problem is, there is an unrelated system library libBar loaded which also has fun in it, a variable this time, and gdb prefers that symbol instead of the desired one. I suspect that this is because this hit is a non-stripped debugging symbol.
(gdb) info var ^fun$
File ../bar/baz.c:
256: static const int fun[18];
(gdb) info symbol fun
fun in section .rodata of libBar.so
It's a bit crazy that it tries to call variable as a function, but that's what it tries to do.
The question is, how do I disambiguate the symbol and instruct gdb to use one from libFoo ?
So far, the only way I found requires a manual step (info function ^fun$ above) and then call (void)0x00007ffff6d7e3b0(). This isn't too good because it doesn't allow me to script the call across different program runs.

MFXInit() in libmfx.a segfaults when called from shared object

(While Intel's forum is a more natural place to ask this question I'm posting it here hoping for more activity than Intel's total lack thereof -- so far)
I'm unable to create a dynamic link library that uses Intel Media SDK (linux server) to manipulate h264 video and noticed a problem in the design of the MFX library. The way I understand it, programs are supposed to link to static library, like:
$ g++ .... -L/opt/intel/mediasdk/lib/lin_x64 -lmfx
However, this libmfx.a library appears to delegate all calls to a dlopened dynamic library /opt/intel/mediasdk/lib64/libmfxhw64.so. It is worth noting that function names (and signatures) exposed by static and dynamic libraries are identical, which is kind of confusing and dangerous.
While I don't understand the rationale behind this design, it should not be a problem by itself were it not that apparently some static/global initialization from within the library causes havoc when the (static) libmfx.a is included in a shared object. Ie.:
+------+ +-----------+
| main | <-- | mylib.so |
+------+ | | +---------------+
| libmfx.a | (dlopen) | libmfxhw64.so |
| <------------- |
|+---------+| |+-------------+|
||MFXInit()|| || MFXInit() ||
||... || || ... ||
|| || || ||
+===========+ +===============+
The above library could be assembled like this:
$ g++ -shared -o mylib.so my1.o my2.o -lmfx
And then (dynamically) linked to main.o like so:
$ g++ -o main main.o mylib.so -ldl
(Note that the additional libdl is necessary to allow libmfx.a to dlopen() libmfxhw64.so.)
Unfortunately, upon the first MFXInit() call, the program causes a segmentation fault (accessing address 0x0000400). GDB backtrace:
#0 0x0000000000000400 in ?? ()
#1 0x00007ffff61fb4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x00007ffff7bd3a1f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) () from ./lib-a.so
#3 0x00007ffff7bd12b1 in MFXInit () from ./lib-a.so
#4 0x00007ffff7bd09c8 in test_mfx () at lib.c:12
#5 0x0000000000400744 in main (argc=1, argv=0x7fffffffe0d8) at main.c:8
(Observe that MFXInit() at stackframe #3 is the one in libmfx.a whereas the one at #1 is in libmfxhw64.so.)
Note that there is no crash when mylib is created as a static library. Using breakpoints and disassembler, I managed to make following backtrace snapshot where in both cases #1 is at MFXInit+424, but they appear to hit different versions of MFXQueryVersion (absolute addresses are meaningless due to relocation):
#0 0x00007ffff6411980 in MFXQueryVersion () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#1 0x00007ffff640c4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x000000000040484f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) ()
#3 0x00000000004020e1 in MFXInit ()
#4 0x0000000000401800 in test_mfx () at lib.c:12
#5 0x0000000000401794 in main (argc=1, argv=0x7fffffffe0e8) at main.c:8
Because both static and shared Intel libs expose the same API functions, I can link straight into libmfxhw64.so guts directly, but I suppose that bypassing the static "dispatcher" is without warranty(?)
Could someone explain Intel's idea behind said design? Spec., why provide a static library that only delegates to an .so that has identical interface?
Also, it appears that the SEGV is caused by static/global data in either libmfx.a or libmfxhw64.so. Is there a way to force a specific execution order on dynamically loaded static/global sections? What is the best approach to debug these kinds of problems?
Tested with Intel Media SDK R2 (ubuntu 12) and Intel Media SDK 2015R3-R5 (Centos 7, 1.13/1.15) on Intel Haswell i7-4790 #3.6Ghz
If you have a working Intel MSDK setup, please compile my example code to confirm the issue.
At the very end of the file "readme-dispatcher-linux.pdf" in recent releases of the dispatcher source code, there is this:
There is slight difference between using Dispatcher library from
executable module or from shared object. To mitigate symbol conflict
between itself and SDK shared object on Linux*, application should:
1) link against libdispatch_shared.a instead of libmfx.a
2) define MFX_DISPATCHER_EXPOSED_PREFIX before any SDK includes
I have used this, and it works to address the symbol conflict issue you describe.
You can find this file, if you install "Intel Media Server Studio Professional 2016". There is a free community edition. The source files and the PDF will be found at /opt/intel/mediasdk/opensource/
(OK, since no one seems eager, I'll do the inelegant thing and post an answer to my own question).
After considerable research trying to break the unintentional circular linking, I discovered that the ld option --exclude-libs provides solace. Essentially, I was looking for a way to force removal of any libmfx.a symbols after using them to resolve dependencies in lib.o while creating the DLL. This could be accomplished by creating the so like this:
g++ -shared -o lib-a.so lib.o -L/opt/intel/mediasdk/lib/lin_x64 -lmfx -Wl,--exclude-libs=libmfx
Once the library is created like this, Bob's you uncle:
g++ -o main-so-a main.o lib-a.so -ldl
(Note that libdl is still needed because Intel's MFX (now inside lib-a.so) still uses dlopen to discover libmfxhw64.so)
From the ld man page:
--exclude-libs lib,lib,...
Specifies a list of archive libraries from which symbols should not be
automatically exported. The library names may be delimited by commas or
colons. Specifying "--exclude-libs ALL" excludes symbols in all archive
libraries from automatic export. This option is available only for the
i386 PE targeted port of the linker and for ELF targeted ports. For i386
PE, symbols explicitly listed in a .def file are still exported,
regardless of this option. For ELF targeted ports, symbols affected
by this option will be treated as hidden.
So, essentially the trick is no make sure that the relevant ELF symbols are marked hidden. Normally this would be handled through #pragmas by the library developers (ie. Intel), but due to their negligence this needs to be retrofitted in this case.
I suppose the same could have been accomplished with a --version-script map file, but that might have turned out to be more fragile since we want to fully encapsulate libmfx.a anyway.

How to Determine Which Shared Library a Function Belongs to in gdb?

When I get the callstack from gdb, I only get function names and source file information.
(gdb) f
#0 main (argc=1, argv=0xbffff1d4) at main.c:5
I don't get which Shared Library or Application the function belongs to.
On Windows, Windbg or Visual Studio will show callstacks with "myDll!myFunc" format, which shows you which module the function belongs to.
Currently in gdb I'm using "info address [function]" to get the address of the function symbol, and then use "info share" to manually find the range in which the function lies in memory to determine which library it is in.
Anyway to see the library directly without this manual process?
You can use info symbol. It prints a library name for a function.
Like this:
(gdb) info symbol f
f(double) in section .text of libmylib_gcc.so
(gdb) info symbol printf
printf in section .text of /lib64/libc.so.6

Yet another linking issue with unresolved symbols

I am trying to build my program in OpenCL for ARM GPU - Mali.
I have a library libMali.so, which contains necessary symbols:
arm-v7a15v4r3-linux-gnueabi-nm *root_to_lib*/libMali.so
returns lines such as
002525b4 t clCreateKernel
and many others with all the expected OpenCL symbols.
However, compiling with
arm-v7a15v4r3-linux-gnueabi-g++ -c -Wall mandelbrot.cpp -o mandelbrot.o
arm-v7a15v4r3-linux-gnueabi-g++ mandelbrot.o -o mandelbrot -L*root_to_lib* -lMali
gives me errors like
mandelbrot.cpp:(.text+0x2e4): undefined reference toclCreateKernel'`
and others with all the symbols, which are actually present in libMali.so!
So, I kept the correct order of librabies in linking command, library is on the specified path (it is indeed) and it has the symbols.
Mangling is not the issue in this case as well: extern C specifiers were used in place and you can see that the raw entries of both lib and object file are not mangled.
Trying to accomplish the same thing using the arm-v7a15v4r3-linux-gnueabi-gcc didn't bring any change apart from necessity to link more c++ libs by hand (with -L*path* -llib).
libMali.so was built with arm-v7a15v4r3-linux-gnueabi-g++/gcc/ld, so this is not the matter of toolchain version.
I've ran out of ideas. May be someone here knows more tricky parts of linking process?
EDIT:
In fact, mandelbrot.cpp is a sample code from Mali-SDK. I'm just showing my linker problem on this example, since there obviously are no problems in the code. You can see the code here:
http://malideveloper.arm.com/downloads/deved/tutorial/SDK/opencl/mandelbrot_8cpp_source.html
http://malideveloper.arm.com/downloads/deved/tutorial/SDK/opencl/mandelbrot_8cl_source.html
If you look closely at the nm output:
002525b4 t clCreateKernel
you'll notice that the symbol is marked with a lowercase 't' which indicates that the symbol has a local binding (for example a static function) so it's not considered for binding to a undefined symbol in another object file. You can find an explanation of most of the cryptic "symbol type" letters used by nm here: https://sourceware.org/binutils/docs/binutils/nm.html
The readelf utility's output is more clear about symbols types.
Maybe the library was built incorrectly?