c++: runtime-linking shared object with host app, symbol table issue - c++

I have a hostapp.cpp that loads a object.so shared object at run-time, the shared object is compiled using only with the needed .h files from the host app but at run-time it needs to access those functions (present at the host app).
Compiling the host app with -rdynamic apparently solves this issue but it unnecessarily exposes the object to the full symbol table of the host app, even though it only needs to resolve a few of them.
How can I specify exactly what host-app symbols will be known by the shared object?
Edit: I'm building and running on GNU/Linux with the GNU toolchain.

Your question is under-specified: you never said what platform you are building for, what linker you use, etc.
Assuming you build for Linux, you can specify symbols to export from the main executable using one of the following methods:
If you are using gold (the GNU ELF linker), --export-dynamic-symbol will do what you need.
If you are using binutils linker, you can use linker version script to do the same (example).
You can mark symbols to be exported with __attribute__((visibility("default"))), compile with -fvisibility-hidden, and link with -rdynamic. That should hide most of the symbols, but will not work well if you link in libraries which you can't recompile.

Related

How does a linker produce a library? What are the contents of that library?

Referring to this answer: https://stackoverflow.com/a/6264256/5324086,
I found that a linker has even more functionality than just managing absolute addresses for object file symbols.
What does the library produced by linker contain? Is it something other than ... say a C Standard library?
Why does the linker even need to produce a library?
The exact details depend on the type of library (you can search for shared library formats) but the basic components will include the compiled code, plus a symbol table that tells the linker which address corresponds to each name. Note that this is very similar to an object file. Static libraries are basically archives of object files and the compiler links them in a similar way. With dynamic libraries, the OS can look this up whenever it loads a program, and link the symbols then. They won't generally have the same absolute addresses in every program's address space, so these addresses will be relative to where the OS loads the library.
The C standard library (MSVC runtime on Windows) is an example of a library.
Static libraries are just a collection of object files. You can think of them as a tar file containing all the relevant .a files (or, on Windows, as a zip file containing obj files). The linking part of the linker is not involved here (in facts traditionally static libraries on Unix systems are done with the ar utility, which is somehow related to tar). They are completely resolved at compile time, and they are simply used as a way to avoid rebuilding all the time stuff that is long to build or has complex build procedures.
Dynamic libraries are a different beast. They are fully fledged executables that can be loaded by other processes, so the regular linker is needed for the same reasons it is used in normal executables. Instead of providing just a single entrypoint, they export a full symbols table that is used by the loader (or "runtime linker") to allow the host program to locate the required procedures. Generally they also contain relocation information to allow loading at any address in the target address space (or they are compiled in position independent code for this same reason).

Shared Library Object File Linkage

I'm interested in solutions for the question below for Linux and Windows, GCC, MinGW, and MSVC (if possible).
I have an application that I've written that supports user-defined shared library imports (add on modules). The application scans a directory, finds *.dll files or *.so files, and loads them dynamically at runtime.
So far, all the user modules have been completely composed of self sufficient code. That is, the object files that make up the DLL/SO yielded no incomplete references from the point of view of the linker.
No I want to allow the modules to be able to use functions that are compiled into object files that make up the binary application that is importing these modules. In other words, I want to allow them to use some of my library code, without having to be compiled into the DLL/SO itself. Unfortunately, in the linker phase when building the DLL/SO, this fails with the complaint that there are unresolved symbols.
Is this possible?
Create a library with the code you want to share between the user module and your program.
Now the user program and your program can link with this new library.
why not just make a DLL which is linked by both the main app and all the user libs... this is perfectly legal, safe and does what you want AFAICT.
As for Linux and other ELF platforms, this is perfectly possible. You just need to export the appropriate symbols from your executable, and they will be preferred over the same symbols at the dynamic library. See this question for details.
As suggested by one of the answers to that question, you could instead pass the functions you want to export as callbacks to some initialization function in the dynamic library.
my first thought is: figure out another way to do this... require the add-ons to communicate via a known interface type, and then there will be no need to try to trick the linker...

How does linker know which symbols should be resolved at runtime?

How does linker know which symbols should be resolved at runtime? Particularly I'm interested what information shared object files carry that instruct linker to resolve symbols at runtime. How does the dynamic symbol resolution work at runtime, i.e. what executable will do to find the symbol and in case multiple symbols with the same name were defined which would be found?
What happens if the file was linked only statically, but then it's linked dynamically at run-time as part of a shared library? Which symbol will be used by the executable? In other words, is that possible to override symbols in an executable by linking those symbols into a shared library?
The platform in question is SUN OS.
Try the below link. I hope it answers your question
http://www.linuxjournal.com/article/6463
Check out this article from Linux Journal. For more information -- perhaps specifically related to Windows, AIX, OSx, etc -- I would recommend the Wikipedia article on Linker (computing) and the references therein.
If a file is statically linked there is no run time resolution to speak of. If a shared object links to that same library either dynamically or statically, the version linked to the library will only effect code executed in the library. This can cause problems if you link to two different versions of the same library that are incompatible and shift data back and forth.

Preventing symbols from being stripped in IBM Visual Age C/C++ for AIX

I'm building a shared library which I dynamically load (using dlopen) into my AIX application using IBM's VisualAge C/C++ compiler. Unfortunately, it appears to be stripping out necessary symbols:
rtld: 0712-002 fatal error: exiting.
rtld: 0712-001 Symbol setVersion__Q2_3CIF17VersionReporterFRCQ2_3std12basic_stringXTcTQ2_3std11char_traitsXTc_TQ2_3std9allocatorXTc__ was referenced
from module ./object/AIX-6.1-ppc/plugins/plugin.so(), but a runtime definition
of the symbol was not found.
Both the shared library and the application which loads the shared library compile/link against the static library which contains the VersionReporter mentioned in the error message.
To link the shared library I'm using these options: -bM:SRE -bnoentry -bexpall
To link the application, I'm using this option: -brtl
Is there an option I can use to prevent this symbol from being stripped in the application? I've tried using -nogc as stated in the IBM docs, but that causes the shared library to be in an invalid format or the application to fail to link (depending on which one I use it with).
Yes. This is not really connected to a particular language or compiler. The same general technique is used for gcc for example. -bI:foo.exp is used to tell the linker that the symbols listed in foo.exp will come from the name at the top. Likewise, -BE:dog.exp is used to tell the linker that the symbols listed in dog.exp are exported and can be used by others.
You can see /bin/ldd and /bin/dump can be used to review these symbols.
I figured this out. The trick is to use an export list so that symbols used in the plugin but not used in the binary aren't stripped out.
# version.exp:
setVersion__Q2_3CIF17VersionReporterFRCQ2_3std12basic_stringXTcTQ2_3std11char_traitsXTc_TQ2_3std9allocatorXTc__
And then when linking the application use: -brtl -bexpfull -bE:version.exp
There's more information here: Developing and Porting C and C++ Applications on AIX.

Why does the C++ linker require the library files during a build, even though I am dynamically linking?

I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.