MFXInit() in libmfx.a segfaults when called from shared object - c++

(While Intel's forum is a more natural place to ask this question I'm posting it here hoping for more activity than Intel's total lack thereof -- so far)
I'm unable to create a dynamic link library that uses Intel Media SDK (linux server) to manipulate h264 video and noticed a problem in the design of the MFX library. The way I understand it, programs are supposed to link to static library, like:
$ g++ .... -L/opt/intel/mediasdk/lib/lin_x64 -lmfx
However, this libmfx.a library appears to delegate all calls to a dlopened dynamic library /opt/intel/mediasdk/lib64/libmfxhw64.so. It is worth noting that function names (and signatures) exposed by static and dynamic libraries are identical, which is kind of confusing and dangerous.
While I don't understand the rationale behind this design, it should not be a problem by itself were it not that apparently some static/global initialization from within the library causes havoc when the (static) libmfx.a is included in a shared object. Ie.:
+------+ +-----------+
| main | <-- | mylib.so |
+------+ | | +---------------+
| libmfx.a | (dlopen) | libmfxhw64.so |
| <------------- |
|+---------+| |+-------------+|
||MFXInit()|| || MFXInit() ||
||... || || ... ||
|| || || ||
+===========+ +===============+
The above library could be assembled like this:
$ g++ -shared -o mylib.so my1.o my2.o -lmfx
And then (dynamically) linked to main.o like so:
$ g++ -o main main.o mylib.so -ldl
(Note that the additional libdl is necessary to allow libmfx.a to dlopen() libmfxhw64.so.)
Unfortunately, upon the first MFXInit() call, the program causes a segmentation fault (accessing address 0x0000400). GDB backtrace:
#0 0x0000000000000400 in ?? ()
#1 0x00007ffff61fb4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x00007ffff7bd3a1f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) () from ./lib-a.so
#3 0x00007ffff7bd12b1 in MFXInit () from ./lib-a.so
#4 0x00007ffff7bd09c8 in test_mfx () at lib.c:12
#5 0x0000000000400744 in main (argc=1, argv=0x7fffffffe0d8) at main.c:8
(Observe that MFXInit() at stackframe #3 is the one in libmfx.a whereas the one at #1 is in libmfxhw64.so.)
Note that there is no crash when mylib is created as a static library. Using breakpoints and disassembler, I managed to make following backtrace snapshot where in both cases #1 is at MFXInit+424, but they appear to hit different versions of MFXQueryVersion (absolute addresses are meaningless due to relocation):
#0 0x00007ffff6411980 in MFXQueryVersion () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#1 0x00007ffff640c4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x000000000040484f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) ()
#3 0x00000000004020e1 in MFXInit ()
#4 0x0000000000401800 in test_mfx () at lib.c:12
#5 0x0000000000401794 in main (argc=1, argv=0x7fffffffe0e8) at main.c:8
Because both static and shared Intel libs expose the same API functions, I can link straight into libmfxhw64.so guts directly, but I suppose that bypassing the static "dispatcher" is without warranty(?)
Could someone explain Intel's idea behind said design? Spec., why provide a static library that only delegates to an .so that has identical interface?
Also, it appears that the SEGV is caused by static/global data in either libmfx.a or libmfxhw64.so. Is there a way to force a specific execution order on dynamically loaded static/global sections? What is the best approach to debug these kinds of problems?
Tested with Intel Media SDK R2 (ubuntu 12) and Intel Media SDK 2015R3-R5 (Centos 7, 1.13/1.15) on Intel Haswell i7-4790 #3.6Ghz
If you have a working Intel MSDK setup, please compile my example code to confirm the issue.

At the very end of the file "readme-dispatcher-linux.pdf" in recent releases of the dispatcher source code, there is this:
There is slight difference between using Dispatcher library from
executable module or from shared object. To mitigate symbol conflict
between itself and SDK shared object on Linux*, application should:
1) link against libdispatch_shared.a instead of libmfx.a
2) define MFX_DISPATCHER_EXPOSED_PREFIX before any SDK includes
I have used this, and it works to address the symbol conflict issue you describe.
You can find this file, if you install "Intel Media Server Studio Professional 2016". There is a free community edition. The source files and the PDF will be found at /opt/intel/mediasdk/opensource/

(OK, since no one seems eager, I'll do the inelegant thing and post an answer to my own question).
After considerable research trying to break the unintentional circular linking, I discovered that the ld option --exclude-libs provides solace. Essentially, I was looking for a way to force removal of any libmfx.a symbols after using them to resolve dependencies in lib.o while creating the DLL. This could be accomplished by creating the so like this:
g++ -shared -o lib-a.so lib.o -L/opt/intel/mediasdk/lib/lin_x64 -lmfx -Wl,--exclude-libs=libmfx
Once the library is created like this, Bob's you uncle:
g++ -o main-so-a main.o lib-a.so -ldl
(Note that libdl is still needed because Intel's MFX (now inside lib-a.so) still uses dlopen to discover libmfxhw64.so)
From the ld man page:
--exclude-libs lib,lib,...
Specifies a list of archive libraries from which symbols should not be
automatically exported. The library names may be delimited by commas or
colons. Specifying "--exclude-libs ALL" excludes symbols in all archive
libraries from automatic export. This option is available only for the
i386 PE targeted port of the linker and for ELF targeted ports. For i386
PE, symbols explicitly listed in a .def file are still exported,
regardless of this option. For ELF targeted ports, symbols affected
by this option will be treated as hidden.
So, essentially the trick is no make sure that the relevant ELF symbols are marked hidden. Normally this would be handled through #pragmas by the library developers (ie. Intel), but due to their negligence this needs to be retrofitted in this case.
I suppose the same could have been accomplished with a --version-script map file, but that might have turned out to be more fragile since we want to fully encapsulate libmfx.a anyway.

Related

"undefined reference to __dso_handle" while linking static library with -nostdlib [duplicate]

I have an unresolved symbol error when trying to compile my program which complains that it cannot find __dso_handle. Which library is this function usually defined in?
Does the following result from nm on libstdc++.so.6 mean it contains that?
I tried to link against it but the error still occurs.
nm libstdc++.so.6 | grep dso
00000000002fc480 d __dso_handle
__dso_handle is a "guard" that is used to identify dynamic shared objects during global destruction.
Realistically, you should stop reading here. If you're trying to defeat object identification by messing with __dso_handle, something is likely very wrong.
However, since you asked where it is defined: the answer is complex. To surface the location of its definition (for GCC), use iostream in a C++ file, and, after that, do extern int __dso_handle;. That should surface the location of the declaration due to a type conflict (see this forum thread for a source).
Sometimes, it is defined manually.
Sometimes, it is defined/supplied by the "runtime" installed by the compiler (in practice, the CRT is usually just a bunch of binary header/entry-point-management code, and some exit guards/handlers). In GCC (not sure if other compilers support this; if so, it'll be in their sources):
Main definition
Testing __dso_handle replacement/tracker example 1
Testing __dso_handle replacement/tracker example 2
Often, it is defined in the stdlib:
Android
BSD
Further reading:
Subtle bugs caused by __dso_handle being unreachable in some compilers
I ran into this problem. Here are the conditions which seem to reliably generate the trouble:
g++ linking without the C/C++ standard library: -nostdlib (typical small embedded scenario).
Defining a statically allocated standard library object; specific to my case is std::vector. Previously this was std::array statically allocated without any problems. Apparently not all std:: statically allocated objects will cause the problem.
Note that I am not using a shared library of any type.
GCC/ARM cross compiler is in use.
If this is your use case then merely add the command line option to your compile/link command line: -fno-use-cxa-atexit
Here is a very good link to the __dso_handle usage as 'handle to dynamic shared object'.
There appears to be a typo in the page, but I have no idea who to contact to confirm:
After you have called the objects' constructor destructors GCC automatically calls the function ...
I think this should read "Once all destructors have been called GCC calls the function" ...
One way to confirm this would be to implement the __cxa_atexit function as mentioned and then single step the program and see where it gets called. I'll try that one of these days, but not right now.
Adding to #natersoz's answer-
For me, using -Wabi-tag -D_GLIBCXX_USE_CXX11_ABI=0 alongside -fno-use-cxa-atexit helped compile an old lib. A telltale is if the C++ functions in the error message have std::__cxx11 in them, due to an ABI change.

std::thread weak when using -static-libstdc++, thus causing crash at runtime

I need to build a portable shared object, which is a plugin for another software on Linux. I did some amount of reading on the subject, came down to the conclusion, that I should build a sysrooted gcc (gcc 5.4.0 if it matters) with a decently old glibc (to provide compatibility with older systems), link with -static-libstdc++ and -static-libgcc thus arriving to a point where I have something that only depends on the hosts glibc and some other minor stuff which will always be present.
Now, I did all that and now I am experiencing a weird crash - segmentation fault happens in a place where the code calls std::thread, and gdb actually shows that the stack frame is inside libstdc++.so.6 (where is shouldn't be, ldd of my shared object also does not list libstdc++.so). The top of the stack at the crash is:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff79075e3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 # THIS SHOULD NOT BE HERE RIGHT?
#2 0x00007ffff5a25a5c in std::thread::thread<void (ReferenceAnalytics::*)(std::timed_mutex&), ReferenceAnalytics*&, std::reference_wrapper<std::timed_mutex> >
(this=0x7fffffffcf40, __f=
#0x7fffffffcf60: (void (ReferenceAnalytics::*)(ReferenceAnalytics * const, std::timed_mutex &)) 0x7ffff5a1750c <ReferenceAnalytics::WorkerThreadMethod(std::timed_mutex&)>)
at /home/developer/Toolchains/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/include/c++/5.4.0/thread:137 # Looks like my toolchain
So, I did some reading, and then using nm discovered that my shared object has all the std::thread stuff like ctor, dtor, swap, .... defined as weak symbols (which I assume causes a collision if the host that loads the plugin uses dynamic libstdc++ and then my calls are routed there and all hell breaks loose, is this right?).
My further attempts of googling and reading did not give me an answer to how can I control this as in force the std::thread stuff to be resolved to the static libstdc++ in my sysrooted gcc?
More over, I made a small executable that just does dlopen on my shared object and then calls a method which internally constructs the thread - if the executable is also built with -static-libstdc++ all is well, if not, the crash happens. So I assume my theory about the weak symbol for std::thread being resolved to the hosts libstdc++ is correct, but how to solve this?
If you statically link a DSO against libstdc++ without hiding the libstdc++ symbols, and the main program is linked against libstdc++ as well, then the symbol definitions in the main program will interpose/preempt the definitions in the DSO when it is opened with dlopen.
However, because the main program is not linked against libpthread, the the system libstdc++ DSO in the process image saw that the libpthread symbols were unavailable (null), and thus disabled thread support. However, your DSO needs thread support, but can't get it from the system libstdc++.
As an immediate workaround, you can hide all the statically linked libstdc++ symbols in the DSO. Then no interposition will take place, and your DSO will actually use the libstdc++ copy in the DSO itself, which has already established that there should not be any thread support in the process.
But this will likely not solve all of your problems because late loading of libpthread via dlopen has its problems. We fixed one bug here:
Segfault after a binary without pthread dlopen()s a library linked with pthread
But your distribution may not have that fix, and I expect there will be other issues, one of them being: The second, statically linked copy of libstdc++ is actually needed here because the system libstdc++ has been loaded without thread support (because libpthread was not loaded when its symbols were bound, causing the crash you observed), so you cannot use it for creating threads. It also has activated optimizations which make the library not thread safe (avoid atomic instructions and things like that).

Linkage of standard libraries in C++ code called from ASM

as I am developing my "OsDev" project, where I am learning a new stuff (for somebody who did not code in C/C++ for a long time due to web development it is kinda "new"). I figured out in the other thread, that calling a C++ function from ASM needs to have a extern "C" prefix but now I have problem with the lining of standard libraries as a for example cstdio etc. I stuck with this message.
kc.o: In function `kmain':
kernel.cpp:(.text+0x3e4): undefined reference to `strlen`
C++
#include <string.h>
#include <cstdio>
#include "inc/screen.h"
extern "C" void kmain()
{
clearScreen();
kernel_print((char*)"Hello Github! :-)", 0x04);
}
and if I try to use strlen() it won't link. (BTW. including screen.h is working for some reason).
Compiling script
nasm -f elf32 kernel.asm -o kasm.o
g++ -c kernel.cpp -o kc.o -lgcc -m32 -Wall -Wextra -O2
ld -m elf_i386 -T link.ld -o kernel.bin kasm.o kc.o
link.ld
OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
{
. = 0x100000;
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss) }
}
Thanks for any suggestions. :)
Your code cannot work as kernel's can't directly use shared libraries.
Why can't I use shared libraries directly in my kernel?
When an application is loaded by the operating system, all the required files are brought into its address space. This includes the executable file and any dynamic libraries (all ABI-conforming ELF applications will always link with a system library - the C Standard Library or just libc).
But while loading the kernel, only the original executable is loaded. Multiboot 2 (with GRUB bootloader) will allow you to load kernel-modules which can be dynamic libraries. But still, your kernel must know how to link itself and the kernel-modules in physical memory. To do so, you must implement a ELF parser and dynamic linker in your kernel.
Before implementing one, make sure your kernel is mature enough to systematically handle dynamic memory allocation, pagination, and other basic features.
How can I use the sweet features of libc?
Usually, you won't use all of the userspace functionality of libc. But things like memcpy, strlen, strcpyn and so on are absolutely necessary. You will have to implement these functions on your own, but the better part here is that, you can change the names of these functions. For example, if you prefere camelCase for function names, then you can also use function names like copyMemory, lengthOfString, etc.
https://github.com/SukantPal/Silcos-Kernel
I have built my own kernel, which has a few implementations of the required functions in KernelHost/Source/Util/CircuitPrimitive.cpp. You can look into that. Also, it has a full-fledged module linker. (KernelHost, ModuleFramework, etc. those parent folders contain separate kernel-module source code).
Make sure not to use the standard C headers in your kernel, as for now. Implement all required functions on your own, including printf

Barebones C++ without standard library?

Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?
You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.
"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.

Cannot Debug Shared Library - Symbols Not Loading Properly

I am currently writing a small library, and I want to check it for leaks (among other things); however, for some reason, gdb is not loading the library symbols. I have read many other posts on here (and various other places on the internet) about this, however, I cannot seem to find a solution. Here is what is going on:
I am compiling the shared library with the following flags (these are included in both the final shared library as well as all object files):
CFLAGS=-Wall -O0 -g -fPIC
Likewise, I am compiling the binary memtest (the client application for the library) to check for memory leaks and such with these flags
CFLAGS=-Wall -O0 -g
Now, I inserted a NULL pointer into the library to test if I could trace through it and "debug" the pointer (i.e. it's making it crash). So I try to run it through gdb, but it's a no go. The output of info sharedlibrary is the same for both the executable and the core:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
... Some libraries I am not worried about debugging...
0x00d37340 0x00d423a4 Yes (*) /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0 <--- My lib
.... and some more....
(*): Shared library is missing debugging information.
As you can see, it's not loading the debug information. I am uncertain as to why this is. I have built and linked everything with the -g flag, and I even try -ggdb and -g3 but nothing seems to work properly. When I load in a core dump, this is what I see:
...some libs...
Reading symbols from /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0...done.
Loaded symbols for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
...some more libs...
Notice how my library does not give a (no debugging symbols found) error - anyone have any ideas why? As I said before, I am unable to debug this through running the program gdb ./memtest or through debugging the core file.
Thanks for your help.
EDIT It may also be important to note, that (if you didn't realize by path) this library is a local shared library (i.e. I'm using -Wl,-rpath to link/load it)
EDIT2 It seems my version of GDB was out-of-date. Now, I have updated to the latest version from the CVS server (I have also tried latest release version 7.2) and it can "load" symbols. My info sharedlibrary now reads this:
0x00e418b0 0x00e4be74 Yes /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
However, I am still unable to step through any functions (in the shared library) - anyone have any ideas?
EDIT3 I have also tried to step through linking against a static library (libMyLIB.a) but it still isn't working. My OS is CentOS 5.6; does anyone know of any issues with this system? Also, just another confirmation that my symbols are being loaded (it just can't step through any shared lib function for some reason)
(gdb) sharedlibrary MyLIB
Symbols already loaded for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
I found the reason this wasn't working: I was calling an old function call to initialize a pointer in my test executable. Since the object was never being created, I could never step into the library. Once I updated the function call, all worked well.
That said, if anyone experiences similar issues while all symbols appear to be loaded, be sure to check that all pointers are initialized properly even if they have the correct type.