in gcc how to force symbol resolution at runtime - c++

My first post on this site with huge hope::
I am trying to understand static linking,dynamic linking,shared libraries,static libraries etc, with gcc. Everytime I try to delve into this topic, I have something which I don't quite understand.
Some hands-on work:
bash$ cat main.c
#include "printhello.h"
#include "printbye.h"
void main()
{
PrintHello();
PrintBye();
}
bash$ cat printhello.h
void PrintHello();
bash$ cat printbye.h
void PrintBye();
bash$ cat printbye.c
#include <stdio.h>
void PrintBye()
{
printf("Bye bye\n");
}
bash$ cat printhello.c
#include <stdio.h>
void PrintHello()
{
printf("Hello World\n");
}
gcc -Wall -fPIC -c *.c -I.
gcc -shared -Wl,-soname,libcgreet.so.1 -o libcgreet.so.1.0 *.o
ln -sf libcgreet.so.1.0 libcgreet.so
ln -sf libcgreet.so.1.0 libcgreet.so.1
So I have created a shared library.
Now I want to link this shared library with my main program to create an executable.
gcc -Wall -L. main.c -lcgreet -o greet
It very well works and if I set the LD_LIBRARY_PATH before running greet( or link it with rpath option) I can make it work.
My question is however different:
Since I am anyway using shared library, is it not possible to force symbol resolution at runtime (not sure about the terminology but perhaps called dynamic linking as per the book "Linkers and Loaders"). I understand that we may not want to do it because this makes the program run slow and has overhead everytime we want to run the program, but I am trying to understand this to clear my concepts.
Does gcc linker provide any option to delay symbol resolution at runtime? (to do it with the library we are actually going to run the program with)(as library available at compile time may be different than the one available at runtime if any changes in the library)
I want to be able to do sth like:
bash$ gcc main.c -I.
(what option needed here?)
so that I don't have to give the library name, and just tell it that I want to do symbol resolution at runtime, so headers are good enough for now, actual library names are not needed.
Thanks,
Learner For Ever.

Any linker (gcc, ld or any other) only resolves links at compile-time. That is because the ELF standard (as most others) do not define 'run-time' linkage as you describe. They either link statically (i.e. lib.a) or at start-up time (lib.so, which must be present when the ELF is loaded). However, if you use a dynamic link, the linker will only put in the ELF the name of the file and the symbols it must find, it does not link the file directly. So, if you want to upgrade the lib to a newer version later, you can do so, as long as system can find the same filename (the path can actually be different) and the same symbol names.
The other option, to get symbols at run-time, is to use dlopen, which has nothing to do with gcc or ld. dlopen simply put, opens a dynamic link library, just like fopen might, and returns you a handle, which then you pass to dlsym with the name of the symbol you want, which might be a function name for example. dlsym will then pass you a pointer to that symbol, which you can then use to call the function or use as a variable. This is how plugins are implemented.

I think you are looking for ld option '--unresolved-symbols=ignore-all', yes it can actually do it (ignore prev answer). Imagine scenario where a shared library loaded late (when program is already running), it can use all symbols that are already resolved/loaded by the main process, no need to bother to do it again . btw it does not nervelessly makes it slow , at least on Linux

Related

Linkage of standard libraries in C++ code called from ASM

as I am developing my "OsDev" project, where I am learning a new stuff (for somebody who did not code in C/C++ for a long time due to web development it is kinda "new"). I figured out in the other thread, that calling a C++ function from ASM needs to have a extern "C" prefix but now I have problem with the lining of standard libraries as a for example cstdio etc. I stuck with this message.
kc.o: In function `kmain':
kernel.cpp:(.text+0x3e4): undefined reference to `strlen`
C++
#include <string.h>
#include <cstdio>
#include "inc/screen.h"
extern "C" void kmain()
{
clearScreen();
kernel_print((char*)"Hello Github! :-)", 0x04);
}
and if I try to use strlen() it won't link. (BTW. including screen.h is working for some reason).
Compiling script
nasm -f elf32 kernel.asm -o kasm.o
g++ -c kernel.cpp -o kc.o -lgcc -m32 -Wall -Wextra -O2
ld -m elf_i386 -T link.ld -o kernel.bin kasm.o kc.o
link.ld
OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
{
. = 0x100000;
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss) }
}
Thanks for any suggestions. :)
Your code cannot work as kernel's can't directly use shared libraries.
Why can't I use shared libraries directly in my kernel?
When an application is loaded by the operating system, all the required files are brought into its address space. This includes the executable file and any dynamic libraries (all ABI-conforming ELF applications will always link with a system library - the C Standard Library or just libc).
But while loading the kernel, only the original executable is loaded. Multiboot 2 (with GRUB bootloader) will allow you to load kernel-modules which can be dynamic libraries. But still, your kernel must know how to link itself and the kernel-modules in physical memory. To do so, you must implement a ELF parser and dynamic linker in your kernel.
Before implementing one, make sure your kernel is mature enough to systematically handle dynamic memory allocation, pagination, and other basic features.
How can I use the sweet features of libc?
Usually, you won't use all of the userspace functionality of libc. But things like memcpy, strlen, strcpyn and so on are absolutely necessary. You will have to implement these functions on your own, but the better part here is that, you can change the names of these functions. For example, if you prefere camelCase for function names, then you can also use function names like copyMemory, lengthOfString, etc.
https://github.com/SukantPal/Silcos-Kernel
I have built my own kernel, which has a few implementations of the required functions in KernelHost/Source/Util/CircuitPrimitive.cpp. You can look into that. Also, it has a full-fledged module linker. (KernelHost, ModuleFramework, etc. those parent folders contain separate kernel-module source code).
Make sure not to use the standard C headers in your kernel, as for now. Implement all required functions on your own, including printf

Writing a plugin system?

After many hours of research I have turned up nothing, so I turn to you good folks in hopes of a solution. I am going to be writing a bot in c++, and at some point would like to make a plugin system for it. Now I know I could just write a scripting language for it, however, I know its possible to just write an api and have the program link to that dynamically at run time. My question is, how do i get that dynamic linkage (like what hexchat has for its plugins)? Are there any elegant solutions, or at least theories on the typical design?
On Linux and Posix systems, you want to use dlopen(3) & dlsym (or some libraries wrapping these functions, e.g. Glib from GTK, Qt, POCO, etc...). More precisely,
Build a position independent code shared library as your plugin:
gcc -fPIC -Wall -c plugin1.c -o plugin1.pic.o
gcc -fPIC -Wall -c plugin2.c -o plugin2.pic.o
Notice that if the plugin is coded in C++ you'll compile it with g++ and you should declare the plugin functions as extern "C" to avoid name mangling.
Then link your plugin as
gcc -shared -Wall plugin1.pic.o plugin2.pic.o -o plugin.so
You may add dynamic libraries (e.g. a -lreadline at end of command above if your plugin wants GNU readline).
At last, call dlopen with a full path in your main program, e.g.
void* dlh = dlopen("./plugin.so", RTLD_NOW);
if (!dlh) { fprintf(stderr, "dlopen failed: %s\n", dlerror());
exit(EXIT_FAILURE); };
(often dlh is a global data)
Then use dlsym to get the function pointers. So declare their signature in some header included both by program and plugin code like
typedef int readerfun_t (FILE*);
declare some (often) global function pointers
readerfun_t* readplugfun;
and use dlsym on the plugin handle dlh:
readplugfun = (readerfun_t*) dlsym(dlh, "plugin_reader");
if (!readplugfun) { fprintf (stderr, "dlsym failed: %s\n", dlerror());
exit(EXIT_FAILURE); };
Of course in your plugin source code (e.g. in plugin1.cc) you'll define
extern "C" int plugin_reader (FILE*inf) { // etc...
You might define some constructor (or destructor) functions in your plugin (see GCC function attributes); the would be called at dlopen (or dlclose) time. In C++ you should simply use static objects. (their constructor is called at dlopen time, their destructor is called at dlclose time; hence the name of the function attributes).
At the end of your program call
dlclose(dlh), dlh = NULL;
In practice, you can do a lot (perhaps a million) of dlopen calls.
You generally want to link your main program with -rdynamic to let its symbols be visible from plugins.
gcc -rdynamic prog1.o prog2.o -o yourprog -ldl
Read Program Library HowTo & C++ dlopen mini HowTo & Drepper's paper: How to Write a Shared Library
The most important part is to define and document a plugin convention (i.e. "protocol"), that is a set (and API) of functions (to be dlsym-ed) required in your plugin and how to use them, in which order they are called, what is the memory ownership policy, etc. If you allow several similar plugins, you might have some well documented hooks in your main program which calls all the dlsym-ed functions of relevant dlopen-ed plugins. Examples: GCC plugins conventions, GNU make modules, Gedit plugins, ...

Is there a way to "statically" interpose a shared .so (or .o) library into an executable?

First of all, consider the following case.
Below is a program:
// test.cpp
extern "C" void printf(const char*, ...);
int main() {
printf("Hello");
}
Below is a library:
// ext.cpp (the external library)
#include <iostream>
extern "C" void printf(const char* p, ...);
void printf(const char* p, ...) {
std::cout << p << " World!\n";
}
Now I can compile the above program and library in two different ways.
The first way is to compile the program without linking the external library:
$ g++ test.cpp -o test
$ ldd test
linux-gate.so.1 => (0xb76e8000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7518000)
/lib/ld-linux.so.2 (0xb76e9000)
If I run the above program, it will print:
$ ./test
Hello
The second way is to compile the program with a link to the external library:
$ g++ -shared -fPIC ext.cpp -o libext.so
$ g++ test.cpp -L./ -lext -o test
$ export LD_LIBRARY_PATH=./
$ ldd test
linux-gate.so.1 => (0xb773e000)
libext.so => ./libext.so (0xb7738000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb756b000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xb7481000)
/lib/ld-linux.so.2 (0xb773f000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xb743e000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xb7421000)
$ ./test
Hello World!
As you can see, in the first case the program uses printf from libc.so, while in the second case it uses printf from libext.so.
My question is: from the executable obtained as in the first case and the object code of libext (either as .so or .o), is it possible to obtain an executable like in the second case? In other words, is it possible to replace the link to libc.so with a link to libext.so for all symbols defined in the latter?
**Note that interposition via LD_PRELOAD is not what I want. I want to obtain an exectuable which is directly linked to the libraries I need. I underline again that fact the I only have access to the first binary and to the external object I want to "statically" interpose **
It is possible. Learn about shared library interposition:
When a program that uses dynamic libraries is compiled, a list of undefined symbols is included in the binary, along with a list of libraries the program is linked with. There is no correspondence between the symbols and the libraries; the two lists just tell the loader which libraries to load and which symbols need to be resolved. At runtime, each symbol is resolved using the first library that provides it. This means that if we can get a library containing our wrapper functions to load before other libraries, the undefined symbols in the program will be resolved to our wrappers instead of the real functions.
What you ask for is traditionally NOT possible. This has already been discussed here and here.
The crux of your question being -
How to statically link a dynamic shared object?
This cannot be done. The reason being the fact that statically linking a library is effectively the same as taking the compilation results of that library, unpacking them in your current project, and using them as if they were your own objects. *.a files are just archives of a bunch of *.o files with all the info intact within them. On the other hand, dynamic libraries are already linked; the symbol re-location info already being discarded and hence cannot be statically linked into an executable.
However you DO have other alternatives to work around this technical limitation.
So what are your options?
1. Use LD_PRELOAD on target system
Shared library interposition is well described in Maxim's answer.
2. Prepare a pre-linked stand-alone executable
elf-statifier is tool for creating portable, self-contained Linux executables.
It attempts to package together a dynamically-linked executable and all the dynamically-linked libraries of into a single stand-alone executable file. This file can be copied and run on another machine independently.
So now on your development machine, you can set LD_PRELOAD and run the original executable and verify that it works properly. At this point elf-statifier creates a snapshot of the process memory image. This snapshot is saved as an ELF executable, with all the required shared-libraries(incluing your custom libext.so) inside. Hence there is no need to make any modifications (for eg. to LD_PRELOAD) on the target system running the newly generated standalone executable.
However, this approach is not guaranteed to work in all scenarios. This is due to the fact that recent Linux kernels introduced VDSO and ASLR.
A commercial alternative to this is ermine. It can work around VDSO and ASLR limitations.
You are going to have to modify the binary. Take a look at patchelf http://nixos.org/patchelf.html
It will let you set or modify either the RPATH or even the "interpreter" i.e. ld-linux-x86-64.so to something else.
From the description of the utility:
Dynamically linked ELF executables always specify a dynamic linker or
interpreter, which is a program that actually loads the executable
along with all its dynamically linked libraries. (The kernel just
loads the interpreter, not the executable.) For example, on a
Linux/x86 system the ELF interpreter is typically the file
/lib/ld-linux.so.2.
So what you could do is run patchelf on the binary in question (i.e. test) with your own interpreter that then loads your library... This may be difficult, but the source code to ld-linux-so is available...
Option 2 would be to modify the list of libraries yourself. At least patchelf gives you a starting point in that the code iterates over the list of libraries (see DT_NEEDED in the code).
The elf specification documentation does indicate that the order is indeed important:
DT_NEEDED: This element holds the string table offset of a null-terminated
string, giving the name of a needed library. The offset is an index
into the table recorded in the DT_STRTAB entry. See ‘‘Shared Object
Dependencies’’ for more information about these names. The dynamic
array may contain multiple entries with this type. These entries’
relative order is significant, though their relation to entries of
other types is not.
The nature of your question indicates you are familiar with programming :-) Might be a good time to contribute an addition to patchelf... Modifying library dependencies in a binary.
Or maybe your intention is to do exactly what patchelf was created to do... Anyway, hope this helps!
Statifier probably does what you want. It takes an executable and all shared libraries and outputs a static executable.
It's possible. You just need to edit ELF header and add your library in Dynamic section.
You can check contents of "Dynamic section" using readelf -d <executable>. Also readelf -S <executable> will tell you offset of .dynsym and .dynstr. In .dynsym you can find array of Elf32_Dyn or Elf64_Dyn structures where your d_tag should be DT_NEEDED and d_un.d_ptr should point to a string "libext.so" located in .dynstr section.
ELF headers are described in /usr/include/elf.h.
It might be possible to do what you're asking by dynamically loading the library using dlopen(), accessing the symbol for the function as a function pointer using dlsym(), and then invoking it via the function pointer. There's a good example of what to do on this website.
I tailored that example to your example above:
// test.cpp
#include <stdio.h>
typedef void (*printf_t)(const char *p, ...);
int main() {
// Call the standard library printf
printf_t my_printf = &printf;
my_printf("Hello"); // should print "Hello"
// Now dynamically load the "overloaded" printf and call it instead
void* handle = dlopen("./libext.so", RTLD_LAZY);
if (!handle) {
std::cerr << "Cannot open library: " << dlerror() << std::endl;
return 1;
}
// reset errors
dlerror();
my_printf = (printf_t) dlsym(handle, "printf");
const char *dlsym_error = dlerror();
if (dlsym_error) {
std::cerr << "Cannot load symbol 'printf': " << dlsym_error << std::endl;
dlclose(handle);
return 1;
}
my_printf("Hello"); // should print "Hello, world"
// close the library
dlclose(handle);
}
The man page for dlopen and dlsym should provide some more insight. You'll need to try this out, as it is unclear how dlsym will handle the conflicting symbol (in your example, printf) - if it replaces the existing symbol, you may need to "undo" your action later. It really depends on the context of your program, and what you're trying to do overall.
It is possible to change the binary.
For example with a tool like ghex you can change the hexadecimal code of the binary, you search in the code for each instance of libc.so and you replace it by libext.so
Not statically, but you can redirect dynamically loaded symbols in a shared library to your own functions using the elf-hook utility created by Anthony Shoumikhin.
The typical usage is to redirect certain function calls from within a 3rd-party shared library which you can't edit.
Let's say your 3rd party library is located at /tmp/libtest.so, and you want to redirect printf calls made from within the library, but leave calls to printf from other locations unaffected.
Exemplar app:
lib.h
#pragma once
void test();
lib.cpp
#include "lib.h"
#include <cstdio>
void test()
{
printf("hello from libtest");
}
In this example, the above 2 files are compiled into a shared library libtest.so and stored in /tmp
main.cpp
#include <iostream>
#include <dlfcn.h>
#include <elf_hook.h>
#include "lib.h"
int hooked_printf(const char* p, ...)
{
std::cout << p << " [[ captured! ]]\n";
return 0;
}
int main()
{
// load the 3rd party shared library
const char* fn = "/tmp/libtest.so";
void* h = dlopen(fn, RTLD_LAZY);
// redirect printf calls made from within libtest.so
elf_hook(fn, LIBRARY_ADDRESS_BY_HANDLE(h), "printf", (void*)hooked_printf);
printf("hello from my app\n"); // printf in my app is unaffected
test(); // test is the entry point to the 3rd party library
dlclose(h);
return 0;
}
Output
hello from my app
hello from libtest [[ captured! ]]
So as you can see it is possible to interpose your own functions without setting LD_PRELOAD, with the added benefit that you have finer-grained control of which functions are intercepted.
However, the functions are not statically interposed, but rather dynamically redirected
GitHub source for the elf-hook library is here, and a full codeproject article written by Anthony Shoumikhin is here

Linking with .so files (webkit)

I'm trying to create a program that uses some of the code from WebKit/GTK+. Specifically, I want to load a string, use WebKit's parser to construct a DOM tree and then iterate over that tree.
I'm trying to use a class called HTMLDocument. WebKit/GTK+ doesn't expose this as part of its API and I'm running into some trouble linking against it.
I'm able to build WebKit/GTK+ normally, which gives me a file called: libwebkit-1.0.so. My program is:
#include <iostream>
#include <WebCore/config.h>
#include <WebCore/html/HTMLDocument.h>
using namespace WebCore;
int main() {
String title = "test";
RefPtr<HTMLDocument> d = HTMLDocument::create(0);
d->open();
d->write("<!doctype html><html><head><title>" + title + "</title></head><body></body></html>");
}
This compiles fine (I'm using the same include directives used by webkit to build), but results in linking errors.
...test_doc.cpp:18: undefined reference to `WebCore::String::String(char const*)'
...test_doc.cpp:21: undefined reference to WebCore::Document::open(WebCore::Document*)'
...(similar for every function I use)
If I run:
nm -C .libs/libwebkit-1.0.so | grep 'WebCore::Document::open'
I see:
003b1830 T WebCore::Document::open(WebCore::Document*)
which seems to indicate that the function is available. I have a reasonable amount of C++ experience, but not much experience with linking files under Linux.
I'm not expecting this exact problem to be solved, but I'm hoping someone can correct me if I have conceptual problems. My main question is why I see "undefined reference" errors when I'm linking with an .so file that lists that function as being defined. Is another file or build step needed?
Thank you very much.
Using:
Ubuntu 9.10
g++ 4.4.1
g++ is invoked with:
g++ --debug -DHAVE_CONFIG_H -I. `pkg-config --cflags libsoup-2.4` \
-DBUILDING_CAIRO__=1 -DBUILDING_GTK__=1 -DWTF_CHANGES -DWTF_USE_ICU_UNICODE=1 \
-DNDEBUG -I./WebCore -I./WebCore/accessibility -I./WebCore/bindings/js \
-I./WebCore/bridge -I./WebCore/bridge/c -I./WebCore/css -I./WebCore/dom \
...many more webkit include directories...
-DDATA_DIR=\"/usr/local/share\" \
test_doc.cpp -o test_doc.out \
./webkit-1.1.15.3/.libs/libwebkit-1.0.so
(I get the same result with -L/path/to/lib -lwebkit-1.0)
I think you might be running into an ordering problem: man g++ specifies that the order of the -l option is significant, and from memory the linker will only look for symbols in objects which have preceeded the current file on the command line.
I suspect what is happening is that the linker is trying to link test_doc before it's seen libwebkit-1.0.so, so it hasn't seen any of those symbols yet and bails.
You should use the -L/path/to/web and -lwebkit-1.0.
Also, I would compile your .cpp file in to a .o and then build your executable separately to make sure things are isolated.
Anyway, you may need to set your $LD_LIBRARY_PATH environment variable to include the path where that .so is stored. If you link to a shared library, you will need that library at run-time. Therefore, you do not want to have your webkit SO stored in its build directory (build/.libs). You want to install it. If you are not root, then you should ./configure with a --prefix=/some/path to install it to some local directory. Alternatively, you can link against the static library. One way to do this is to use the -bstatic (or similar) flag before your -lwebkit-1.0.
This is a good resource for Linux library creation and use.
I think you're issue is that the symbols you need are not exported. You can do objdump --dynamic-syms libwebkit-1.0.so to see which symbols are available. In the WebKit GTK build files there is use of the -fvisibility=hidden flag to restrict the symbols. Check your generated GNUMakefile and you'll see SYMBOL_VISIBILITY = -fvisibility=hidden. You should be able to modify the build files to get what you need.

Easy check for unresolved symbols in shared libraries?

I am writing a fairly large C++ shared-object library, and have run into a small issue that makes debugging a pain:
If I define a function/method in a header file, and forget to create a stub for it (during development), since I am building as a shared object library rather than an executable, no errors appear at compile-time telling me I have forgotten to implement that function. The only way I find out something is wrong is at runtime, when eventually an application linking against this library falls over with an 'undefined symbol' error.
I am looking for an easy way to check if I have all the symbols I need at compile time, perhaps something I can add to my Makefile.
One solution I did come up with is to run the compiled library through nm -C -U to get a demangled list of all undefined references. The problem is this also comes up with the list of all references that are in other libraries, such as GLibC, which of course will be linked against along with this library when the final application is put together. It would be possible to use the output of nm to grep through all my header files and see if any of the names corresponding.. but this seems insane. Surely this is not an uncommon issue and there is a better way of solving it?
Check out the linker option -z defs / --no-undefined. When creating a shared object, it will cause the link to fail if there are unresolved symbols.
If you are using gcc to invoke the linker, you'll use the compiler -Wl option to pass the option to the linker:
gcc -shared ... -Wl,-z,defs
As an example, consider the following file:
#include <stdio.h>
void forgot_to_define(FILE *fp);
void doit(const char *filename)
{
FILE *fp = fopen(filename, "r");
if (fp != NULL)
{
forgot_to_define(fp);
fclose(fp);
}
}
Now, if you build that into a shared object, it will succeed:
> gcc -shared -fPIC -o libsilly.so silly.c && echo succeeded || echo failed
succeeded
But if you add -z defs, the link will fail and tell you about your missing symbol:
> gcc -shared -fPIC -o libsilly.so silly.c -Wl,-z,defs && echo succeeded || echo failed
/tmp/cccIwwbn.o: In function `doit':
silly.c:(.text+0x2c): undefined reference to `forgot_to_define'
collect2: ld returned 1 exit status
failed
On Linux (which you appear to be using) ldd -r a.out should give you exactly the answer you are looking for.
UPDATE: a trivial way to create a.out against which to check:
echo "int main() { return 0; }" | g++ -xc++ - ./libMySharedLib.so
ldd -r ./a.out
What about a testsuite ? You create mock executables that link to the symbols you need. If the linking fails, it means that your library interface is incomplete.
I had the same problem once. I was developing a component model in C++, and, of course, components should load at runtime dynamically. Three solutions come to mind, that were the ones I applied:
Take some time to define a build system that is able to compile statically. You'll lose some time engineering it, but it will save you much time catching these annoying runtime errors.
Group your functions in well-known and well-understood sections, so that you can group of functions/stubs to be sure that each corresponding function has its stub. If you take the time on documenting it well, you can write perhaps a script that checks the definitions (via, for example, its doxygen comments) and check the corresponding .cpp file for it.
Do several test executables that load the same set of libraries and specify the RTLD_NOW flag to dlopen (if you're under *NIX). They will signal the missing symbols.
Hope that helps.