Question: Are dynamically linked C++ programs on ELF platforms always on the brink of producing undefined behavior by violating the one definition rule?
More specific: By simply writing a shared library exposing one function
#include <string>
int __attribute__((visibility("default"))) combined_length(const char *s,
const char *t)
{
const std::string t1(t);
const std::string u(s + t1);
return u.length();
}
and compiling it with GCC 7.3.0 via
$ g++ -Wall -g -fPIC -shared \
-fvisibility=hidden -fvisibility-inlines-hidden \
-o liblibrary.so library.cpp
I create a binary which defines a weak symbol for the operator+() of a pointer to a character array and a string:
$ readelf -sW liblibrary.so | grep "_ZStpl"
24: 0000000000000ee2 202 FUNC WEAK DEFAULT 12 _ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_RKS8_
...
But looking at the standard library binary I got
$ readelf -sW /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep "_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_RKS8_"
2829: 000000000012b1c0 169 FUNC WEAK DEFAULT 13 _ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_RKS8_##GLIBCXX_3.4.21
That's the point where I say: Oh my gosh, the symbol inside my library ought to have a version attached to it too!
In the current state I'm fine because I can assume that the standard library binary is built with the same headers as my library. But what happens if the implementers of libstdc++-v3 decide to define a new version of this function and tag it with GLIBCXX_3.4.22? Since the symbol is weak, the runtime linker is free to decide whether it takes the unversioned symbol of my library or the versioned symbol of the libstdc++-v3. If I ship my library to such a system I provoke undefined behavior there. Something symbol versions should have solved for me.
Related
I am trying to re-compile an existing C++ application.
Unfortunately, I must rely on a proprietary library I only have a pre-compiled static archive of.
I use g++ version 7.3.0 and ld version 2.30.
Whatever GCC version it was compiled with, it is ancient.
The header file defines the method:
class foo {
int bar(int & i);
}
As nm lib.a shows, the library archive contains the corresponding exported function:
T bar__4fooRi
nm app.o shows my recent compiler employing a different kind of name mangling:
U _ZN4foo9barERi
Hence the linker cannot resolve the symbols provided by the library.
Is there any option to chose the name mangling algorithm?
Can I introduce a map or define the mangled names explicitly?
#Botje's suggestion lead me to writing a linker script like this (the spaces in the PROVIDE stanza are significant):
EXTERN(bar__4fooRi);
PROVIDE(_ZN4foo9barERi = bar__4fooRi);
As far as I understood, this will regard bar__4fooRi as an externally defined symbol (which it is). If _ZN4foo9barERi is searched for, but not defined, bar__4fooRi will take its place.
I am calling the linker from the GNU toolchain like this (mind the order – the script needs to be after the dependant object but before the defining library):
g++ -o application application.o script.ld -lfoo
It looks like this could work.
At least in theory.
The linker now regards other parts of the library, which in turn depends on other unresolvable symbols including (but not limited to) __throw, __cp_pop_exception, and __builtin_delete. I have no idea where these functions are defined nowadays. Joxean Koret shows some locations in this blog post based on guesswork (__builtin_new probably is malloc) – but I am not that confident.
These findings lead me to the conclusion that the library relies on a different style of exception handling and probably memory management, too.
EDIT: The result may be purely academical due to ABI changes as pointed out by #eukaryota, a linker script can indeed be used to "alias" symbols. Here is a complete minimal example:
foo.h:
class Foo {
public:
int bar(int);
};
foo.cpp:
#include "foo.h"
int Foo::bar(int i) {
return i+21;
}
main.cpp:
class Foo {
public:
int baa(int); // use in-place "header" to simulate different name mangling algorithm
};
int main(int, char**) {
Foo f;
return f.baa(21);
}
script.ld:
EXTERN(_ZN3Foo3barEi);
PROVIDE(_ZN3Foo3baaEi = _ZN3Foo3barEi); /* declare "alias" */
Build process:
g++ -o libfoo.o -c foo.c
ar rvs libfoo.a libfoo.o # simulate building a library
g++ -o app main.o -L. script.ld -lfoo
app is compiled, can be executed and returns expected result.
I am using Apple LLVM version 8.0.0 (clang-800.0.42.1) to compile. It's about 1200 files, but I have used them before. I go and compile them all, no problems. Then I make my static library (ar rcs libblib.a *.o), no problems. So when I try to use my brand new library, I have my problem.
gcc main.c -L. -lblib
Undefined symbols for architecture x86_64:
"_N_method", referenced from:
_main in main-7fc584.o
ld: symbol(s) not found for architecture x86_64
But, I know this is defined. I check to see that the file is included (ar -t libblib.a | grep N_METHOD.o) and it is in there. Check the source file, and there is the method, exactly named as it is in the header file. What is the problem I am having here? I am at a complete loss and I am hoping I am missing something simple.
I did nm -g N_METHOD.o and got back:
0000000000000000 T __Z8N_methodP6stacks
Transferring comments into an answer.
Based on the question content, I asked:
Have you checked that N_METHOD.o is a 64-bit object file (or a fat object file with both 32-bit and 64-bit code in it)? If it is a 32-bit object file, then it is no use for a 64-bit program. However, that's a little unlikely; you have to go out of your way to create a 32-bit object file on Mac.
Have you run nm -g N_METHOD.o to see whether _N_method is defined in the object file?
I did nm -g N_METHOD.o and got back:
0000000000000000 T __Z8N_methodP6stacks
Don't compile C code with a C++ compiler. Or don't try to compile C++ code with a C compiler. The mangled name (__Z8N_methodP6stacks) is for C++. Maybe you simply need to link with g++ instead of gcc? They are different languages — this is the property of 'type-safe linkage' that is characteristic of C++ and completely unknown to C.
First step — compile and link with:
g++ main.c -L. -lblib
Assuming that the source is in the C++ subset of C (or C subset of C++), then the chances are that should work. At least, if the code contains N_Method(&xyz) where xyz is a variable of type stacks, then there's a chance it will call __Z8N_methodP6stacks.
The following code:
typedef struct stacks stacks;
extern int N_method(stacks*);
extern int relay(stacks *r);
int relay(stacks *r) { return N_method(r); }
compiles with a C++ compiler to produce the nm -g output:
0000000000000000 T __Z5relayP6stacks
U __Z8N_methodP6stacks
It also compiles with a C compiler to produce the nm -g output:
0000000000000038 s EH_frame1
U _N_method
0000000000000000 T _relay
Background:
I have application which part is used as library for other independent application. They link to that library (lets say lib.so) in linking time. Problem with such approach is that we have to use same external libraries like boost, ace etc or we will have duplicated symbols which in the end will cause crash. We want to resolve this problem.
I know two techniques - one is hiding all symbols (not sure about orders of scopes global/local for shared library) and other is to use dynamic linking. We chose 2nd options (dynamic linking) as it gave client opportunity to do easy testing with stubbed lib.so. and we have very simple api.
I wrote below small example of application which load example shared library and it crash (I want to understand why it crashed and how it should be written).
Crash is in dlopen, exactly in initialization of global variable on assignment to std::string (constructor of Aclass type). From our testing it looks that any access to std library while ongoing initialization of library will result in crash.
We managed to remove crash by adding -fPIC flag to EXECUTABLE (why this resolved our problem, I thought it should be set for shared library, may anyone explain me that more precisely)? Unnecessary to my understanding this flag is problematic as it slow down application and in my case (low latency applications) it is quite problematic.
To summary:
1. Why this crash occur?
2. Why -fPIC flag is enough to resolve this crash?
3. Why it is enough to set -fPIC flag to executable?
4. Is it possible to resolve my problem in other way so shared library and client application could use different versions of libraries (like boost, ace etc, compiler, linux version and std libraries are guarantee to be same)?
5. Removing flag RTLD_DEEPBIND will fix crash too but from gcc man it looks that I should use this flag as it will change symbol scope order for shared library - first it will search symbols in local scope then in global - looks as must have for me as shared library will use different external libraries than executable (and dynamic loading will protect executable with polluting its symbol scope). Why removing this flag fix crash in this simple case?
Shared library dynLib.cpp:
#include <string>
class Aclass
{
std::string s;
s = "123";
}
Aclass a;
Exacutable main.cpp:
#include <stdlib.h>
#include <dlfcn.h>
#include <string>
#include <unistd.h>
#include <iostream>
int main()
{
std::string dummyCrasher;
dlerror();
void* handle = dlopen("./libdynLib.so", RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
if(!handle)
{
std::cout << "handle is null" << dlerror();
}
usleep(1000 * 1000 * 10);
}
Makefile: makefile
CXXFLAGS=-m32 -march=x86-64 -Wl,v -g -O3 -Wformat -Werror=format -c
CLINKFLAGS=-Wl,-Bstatic -Wl,Bdynamic -ldl -m32 -march=x86-64
all: dynLib.so dynamiclinking
dynLib.so: dynLib.o
g++44 $(CLINKFLAGS) -shared -o libdynLib.so dynLib.o
dynLib.o: dynLib.cpp
g++44 $(CXXFLAGS) dynLib.cpp
dynamiclinking: main.o
g++44 $(CLINKFLAGS) -o dynamiclinking main.o -ldl
main.o: main.cpp
g++44 (CXXFLAGS) main.cpp
.PHONY: clean
clean:
rm dynLib.o main.o dynamiclinking libdynLib.so
PS. I write that code by hand (could did some spell errors)
PS 2. with -fPIC flag it will work:
main.o: main.cpp
g++44 (CXXFLAGS) main.cpp -fPIC
UPDATE
It is possible to resolve this problem by static linkage of libstdc++. But still my questions are not answered :( Maybe someone have some time to look at it?
UPDATE2 Same problem occur on GCC 4.4.6 and 4.8.1.
I think you are facing the same problem as in When we are supposed to use RTLD_DEEPBIND?, where the executable gets a copy of global variables:
well that's a wonderful feature of you building the main application without the -fPIC option.
[...]
This means that when the symbol is found in libdep.so, it gets copied into the initial data segment of the main executable at that address. Then the reference to duplicate in libdep.so is looked up and it points to the copy of the symbol that's in the main executable.
Due to RTLD_DEEPBIND, dynLib.so is seeing the wrong set of global variables from original libstdc++ when initializing std::string and thus crashes.
As for why the linker has such behavior, this article has a detailed explanation (emphasis mine):
Recall that the program/executable is not relocatable, and thus its data addresses have to bound at link time. Therefore, the linker has to create a copy of the variable in the program's address space, and the dynamic loader will use that as the relocation address. This is similar to the discussion in the previous section - in a sense, myglob in the main program overrides the one in the shared library, and according to the global symbol lookup rules, it's being used instead.
One final note: this behavior is platform-specific, at least on PowerPC there is no such additional copy of global variables in main executable.
Edit: the comments below the accepted answer show that it might be an issue with the Android dynamic loader.
I have a header for a template class with a static member. At runtime the address of the static member is used in the library and in the client code. The template is implicitly instantiated both in the library and in the client code. It works fine on Linux and OSX, the symbol is duplicated but marked as "uniqued" as shown by nm (see below).
However when I compile for ARM (Android), the symbol is marked weak in both the DSO and the executable. The loader does not unify and the symbol is effectively duplicated at runtime!
I read these:
two instances of a static member, how could that be?
Static template data members storage
and especially this answer:
https://stackoverflow.com/a/2505528/2077394
and:
http://gcc.gnu.org/wiki/Visibility
but I am still a little bit puzzled. I understand that the attributes for visibility helps to optimize, but I thought it should work by default. I know the C++ standard does not care about shared library, but does it means that using shared libraries breaks the standard? (or at least this implementation is not C++ standard conform?)
Bonus: how can I fix it? (and not using template is not an acceptable answer:))
Header:
template<class T>
struct TemplatedClassWithStatic {
static int value;
};
template<class T>
int TemplatedClassWithStatic<T>::value = 0;
shared.cpp:
#include "TemplateWithStatic.hpp"
int *addressFromShared() {
return &TemplatedClassWithStatic<int>::value;
}
main.cpp:
#include "TemplateWithStatic.hpp"
#include <cstdio>
int *addressFromShared();
int main() {
printf("%p %p\n", addressFromShared(), &TemplatedClassWithStatic<int>::value);
}
And building, looking at the symbols definitions:
producing .so:
g++-4.8 -shared src/shared.cpp -o libshared.so -I include/ -fPIC
compiling and linking main:
g++-4.8 src/main.cpp -I include/ -lshared -L.
symbols are marked as "unique":
nm -C -A *.so a.out | grep 'TemplatedClassWithStatic<int>::value'
libshared.so:0000000000200a70 u TemplatedClassWithStatic<int>::value
a.out:00000000006012b0 u TemplatedClassWithStatic<int>::value
producing .so
~/project/android-ndk-r9/toolchains/arm-linux-androideabi-4.8/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-g++ -o libshared.so src/shared.cpp -I include/ --sysroot=/Users/amini/project/android-ndk-r9/platforms/android-14/arch-arm/ -shared
compiling and linking main
~/project/android-ndk-r9/toolchains/arm-linux-androideabi-4.8/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-g++ src/main.cpp libshared.so -I include/ --sysroot=${HOME}/project/android-ndk-r9/platforms/android-14/arch-arm/ -I ~/project/android-ndk-r9/sources/cxx-stl/gnu-libstdc++/4.8/include -I ~/project/android-ndk-r9/sources/cxx-stl/gnu-libstdc++/4.8/libs/armeabi-v7a/include -I ~/project/android-ndk-r9/sources/cxx-stl/gnu-libstdc++/4.8/include/backward -I ~/project/android-ndk-r9/platforms/android-14/arch-arm/usr/include ~/project/android-ndk-r9/sources/cxx-stl/gnu-libstdc++/4.8/libs/armeabi-v7a/libgnustl_static.a -lgcc
symbols are weak!
nm -C -A *.so a.out | grep 'TemplatedClassWithStatic<int>::value'
libshared.so:00002004 V TemplatedClassWithStatic<int>::value
a.out:00068000 V TemplatedClassWithStatic<int>::value
Edit, note for the context: I was playing with OOLua, a library helping binding C++ to Lua and my unittests were failing when I started to target Android. I don't "own" the code and I would rather modifying it deeply.
Edit, to run it on Android:
adb push libshared.so data/local/tmp/
adb push a.out data/local/tmp/
adb shell "cd data/local/tmp/ ; LD_LIBRARY_PATH=./ ./a.out"
0xb6fd7004 0xb004
Android does not support unique symbols. It is a GNU extension of ELF format that only works with GLIBC 2.11 and above. Android does not use GLIBC at all, it employs a different C runtime called Bionic.
(update) If weak symbols don't work for you (end update) I'm afraid you would have to modify the code such that it does not rely on static data.
There may be some compiler/linker settings that you can tweak to enable this (have you looked at the -fvisibility flag?).
Possibly a GCC attribute modifier may be worth trying (explicitly set __attribute__ ((visibility ("default"))) on the variable).
Failing that, the only workarounds I could suggest are: (all are somewhat ugly):
Explicitly instantiate all forms of the template that are created in the shared library and provide the initializers in its implementation (not in the header). This may or may not work.
Like (1) but use a shim function as a myers singleton for the shared variable (example below).
Allocate a variable in a map for the class based upon rtti (which might also fail across a shared library boundary).
e.g.
template<class T>
struct TemplatedClassWithStatic {
static int& getValue() { return TemplatedClassWithStatic_getValue((T const*)0); }
};
// types used by the shared library.. can be forward declarations here but you run the risk of violating ODR.
int& TemplatedClassWithStatic_getValue(TypeA*);
int& TemplatedClassWithStatic_getValue(TypeB*);
int& TemplatedClassWithStatic_getValue(TypeC*);
shared.cpp
int& TemplatedClassWithStatic_getValue(TypeA*) {
static int v = 0;
return v;
}
int& TemplatedClassWithStatic_getValue(TypeB*) {
static int v = 0;
return v;
}
int& TemplatedClassWithStatic_getValue(TypeC*) {
static int v = 0;
return v;
}
The executable would also have to provide implementations for any types that it uses to instantiate the template.
I have seen a GCC link with a C++ shared library, but I am not able to reproduce it on my own. So first I create a C++ library with a testfunction:
g++ -shared -o libtest.so test.c
Then I have a test main function which calls the library function and compile it like this
gcc -o prog.out main.c -L. -ltest
Then i receive the error
undefined reference to 'testfunc'
which i think is caused by different refernce in the library ... C names the function testfunc and C++ names the function [some stuff]__testfunc[maybe again some stuff].
I have also tried to use
gcc -o prog.out main.c -l:libtest.so
but this results in the same error.
Therefore, my question is: How is it possible to link a c++ library with gcc to a c file?
Update: I know i can use extern "C", but that's not the way it is solved. Maybe there are some parameters for the linker instead?
Update2: Just thought it could also be possible that the first part is just compiled with c++ and linked with gcc. Also tried this:
g++ -c testlib.c -o testlib.o
gcc -shared -o libtest.so testlib.o
gcc -o prog.out -l:libtest.so
still doesn't work. Is there something wrong with the flags?
Yes, the problem has nothing to do with shared libraries (I think...) and everything to do with name mangling.
In your header, you must declare the function like this:
#ifdef __cplusplus
extern "C" {
#endif
void testfunc(void);
#ifdef __cplusplus
}
#endif
This will cause testfunc to have the same symbol and calling conventions for both C and C++.
On the system I'm using right now, the C symbol name will be _testfunc and the C++ symbol name (assuming you don't use extern "C") will be __Z8testfuncv, which encodes information about the parameter types so overloading will work correctly. For example, void testfunc(int x) becomes __Z8testfunci, which doesn't collide with __Z8testfuncv.
When you use g++ it compiles ALL source as C++. This means all function use the C++ ABI (this also including name mangling). When you use gcc it compiles *.c files using the C ABI (no name mangling).
Thus the same function compiles with the two different compilers will generate different functions (in a lot of ways). That's because they are different languages.
To force g++ to compile a function using the C ABI prefix it with extern "C"
extern "C" void testfunc(char*);
Alternatively use the block version
extern "C" {
<multiple Functions>
}
To be honest I never compile anything with gcc anymore (unless there is some hard requirement to do so (in which case I usually fix the code so it works in C++)). If you compile all files with g++ just makes the processes simpler.
If you are sure it's not because of name mangling. Then it means gcc could not find the library try giving the full path of the library unless the .so file is in standard location. If you are not sure then recheck for any conflict in variable (function) name. Use namespaces to group classes and define the functions inside the classes to avoid naming conflict