undefined symbol when calling dlopen (name mangling problem?) - c++

I try to load a dynamic library with dlopen. The code in the lib should call a function inside the executeable (compiled with flag -rdynamic).
dlopen gives this error:
undefined symbol:
_Z10vGnssTraceNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEhS4_
If I look at the exported symbols of my executeable with nm, I see this:
0004b779 T _Z10vGnssTraceNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEhS4_
And the result of nm for the lib is this:
U _Z10vGnssTraceNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEhS4_
Looks like it should match!? How can I make the lib know the function in the executeable?
According to this https://stackoverflow.com/a/17083153/10551203 it should work.
Any help would be highly appreciated
Ralf

Hello and thanks to everybody who looked at this. Now, after a few days, I know more about the topic and my real problem was completely different, compared to what I expected.
I inspected my executable with nm to see the exported symbols but I didn't use the -D option. Even after I knew it exists, I wasn't sure, if I need it or if it's only applicable for the dynamic lib. In fact, the symbols of the executable were not exported at all.
I use autotools and added the -rdynamic to the AM_LDFLAGS. But the Makefile.am is long and there is another *_LDFLAGS section especially for the executeable, that I wasn't aware of. Because of this, the AM_LDFLAGS were not applied when linking, which is expected, see the AM_LDFLAGS section of https://www.gnu.org/software/automake/manual/html_node/Program-Variables.html :
In some situations, this is not used, in preference to the per-executable (or per-library) _LDFLAGS.
Best regards
Ralf

Related

Loading classes - dynamical link library returned undefined symbol error

I'm using C++ dlopen() to link a shared library named as lib*.so (in directory A) in my main program (in directory B).
I experimented on some simple function loading. Every thing works very well. However, it gave me a headache when I was trying to load class and factory functions that return a pointer to the class object. (I'm using the terms from the tutorial below)
The methodology I used was based on the examples in chapter 3.3 of this tutorial https://www.tldp.org/HOWTO/C++-dlopen/thesolution.html#externC.
There is a bit of polymorphism here ... lib*.so contains a child class that inherits a parent abstract class from the main program directory (directory B). When dlopen() tries to load lib*.so in the main program, it failed due to "undefined symbol".
I used nm command to examine the symbol tables in lib*.so and main program binary. The symbols in these binaries are:
lib*.so : U _ZTI7ParentBox
main program binary: V _ZTI7ParentBox
ParentBox is the name of the parent class inherited by ChildBox in lib*.so. Note that parent class header file is in another project in directory B.
Although there is name mangling the symbol names are exactly the same.
I'm just wondering why the dynamic linker cannot link them? and giving me undefeind symbol error for dlopen()?
Am I missing the understanding of some key concepts here?
P.S. more strangely, it was able to resolve the symbols for member functions between the child class (U type symbol) in lib*.so (T type symbol) and parent class. Why is it able to do this but not able to resolve the undefined symbol for parent class name?
(I've been searching around for a long time and tried -rdynamic, -ldl stuff though I'm not fully understood what they are, but nothing worked)
Update 04 April 2019:
This is the g++ command line I used to make the main program binary.
g++ -fvisibility=hidden -pthread -static-libgcc -static-libstdc++ \
-m64 -fpic -ggdb3 -fno-var-tracking-assignments -std=c++14 \
-rdynamic \
-o ./build/main-prog \
/some_absolute_path/ParentBox.o \
/some_other_pathen/Triangle.o \
/some_other_pathen/Circle.o \
/some_other_pathen/<lots_of_depending_obj> \
/some_absolute_path/librandom.a \
-lz -ldl -lrt -lbz2
I searched every argument of this command line in https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html (This seems to be a good reference site for all fellow programmers working with large projects with complicated g++ line :) )
Thanks to #Employed Russian. With his instructions, the problem narrows down to export the symbols in main program binary.
However, the main program binary has lots of dependencies as you can see from the above command, Circle, Triangle and lots of other object files.
We also need to add "-rdynamic" to the compilation of Circle, Triangle and other dependency object files. Otherwise it does not work.
In my case, I added "-rdynamic" to all files in my project to export all symbols. Not sure about "-fvisibility=hidden" doing anything good. I removed all of them in my Makefile anyway... I know this is not the best way but I will worry about speed later when everything is functionally correct. :)
More Updates:
The correct solution is in #Employed Russian's update in the answer.
My previous solution happened to work because I also removed "-fvisibility=hidden". It is not necessary (and probably wrong) to add -rdynamic to all objects used in the final link.
Please refer to #Employed Russian's explanation which addresses the core issue.
Final Update:
For fellow programmers who are interested in how C/C++ program is executed and how library can be linked, here is a good reference web course (Life of Binary) by Xeno Kovah: http://opensecuritytraining.info/LifeOfBinaries.html
You can also find a playlist on youtube. Just search "Life of Binary"
Although there is name mangling the symbol names are exactly the same. I'm just wondering why the dynamic linker cannot link them?
Most likely explanation: the symbol is not exported from the main binary.
Repeat your command with nm -D:
nm -AD lib*.so main-prog | grep ' _ZTI7ParentBox$'
Chances are, you'll see lib*.so: U _ZTI7ParentBox and nothing from main-prog.
This happens because normally the linker will not export any symbol from main-prog, that is not referenced by some shared library participating in the link (and your lib*.so isn't linked with main-prog, or else you wouldn't need to dlopen it).
To change that behavior, you could add -Wl,--export-dynamic linker flag when linking main-prog. That instructs the linker to export everything that is linked into main-prog.
tried -rdynamic
That is equivalent to -Wl,--export-dynamic, and should have worked (assuming you added it to the main-prog link line, and not somewhere else).
Update:
Everything works now! Since main-prog also depends on some other objects, it appears that simply add -rdynamic to the final main-prog linking does not resolve the problem. We need to add "-rdynamic" to the compilation of those depending objects.
That is the wrong solution. Your problem is that -fvisibility=hidden tells the compiler to mark all symbols that go into main-prog as not exported, and -rdynamic doesn't export any hidden symbols.
The correct solution is to remove -fvisibility=hidden from any objects that define symbols you do want to export, and add -rdynamic to the final link.

Get dlopen to ignore undefined symbols

I am compiling a dynamically generated C++ file as shared object which contains references to symbols available only in it's full build.
g++ -o tmp_form.so -fPIC -shared -lsomelib -std=gnu99 tmp_form.cc
I don't need the missing symbols for my current program, only those from the linked library. But dlopen does require them to be available or fails otherwise. The missing symbols are all variables which are being referenced in structs.
One option would be to add the weak reference attribute to the missing symbols in the generated code. But I would like to avoid making changes to the code generator if possible.
Any advise is appreciated.
Your link command is incorrect:
... -lsomelib ... tmp_form.cc
should be
... tmp_form.cc -lsomelib
The order of sources/objects and libraries on the link line does matter.
If you are using an ELF platform and a very recent build of Gold linker, you can "downgrade" unresolved symbols to weak with --weak-unresolved-symbols option (added here) without modifying the source.
Otherwise, you'll have to modify sources, there is no other way.
P.S. Function references would not have a problem with RTLD_LAZY due to lazy binding, but for data references weak unresolved is your only choice, lazy binding is not possible for them.
Try dlopen("/path/to/the/library", RTLD_LAZY);

How can I get gcc to add a prefix to all symbol names

I know that in the past there was an option -fprefix-function-name that would add a prefix to all generated symbols, it doesn't seem to be part of gcc anymore. Is there any other way to do this?
I believe this answer will give you the solution.
In short, you can 'prefix' symbols in an existing library using objcopy like this:
objcopy --prefix-symbols=foo_ foo.o
*EDIT: George Skoptsov's solution's better than mine :) The nm trick might come in handy though.
This is not exactly what you are looking for, but I have had to do something similar in the past (renaming the symbols exported by a library)
If you know the names of the symbols you want to redefine you can try using objcopy --redefine-syms old=new . See the man pages of objcopy for more details on the input (objcopy might overwrite your file so be careful with that)
If you do not know the names of the symbols you can trying using nm to get a list of symbols. Again, since I am not sure what kind of symbols you are looking for, the man pages will probably be your best bet.

How to handle linker errors in C++/GNU toolchain?

Given a C++/GNU toolchain, what's a good method or tool or strategy to puzzle out linker errors?
Not sure exactly what you mean but if you are talking about cryptic linker symbols like:
mylib.so: undefined symbol: _ZN5CandyD2Ev
you can use c++filt to do the puzzling for you.
c++filt _ZN5CandyD2Ev
will return Candy::~Candy() so somehow Candy's destructor didn't get linked.
With gcc toolchain, I use:
nm: to find the symbols in object files
ld: to find how a library links
c++filt: to find the C++ name of a symbol from its mangled name
Check this for details.
Well the first thing would be RTFM. No, seriously, read the documentation.
If you don't want to do that, try a search on the error that comes up.
Here are a few other things to remember: "missing" symbols are often an indication that you haven't included the appropriate source or library; "missing" symbols are sometimes an indication that you're attempting to link a library created with a different mangling convention (or a different compiler); make sure that you have extern "C" where appropriate; declaring and defining aren't the same thing; if your compiler doesn't support "export" make sure your template code is available for when you instantiate objects.
Look at the names of the symbols that are reported to be problematic.
If there are missing symbols reported, find out in which source files or libraries those function/... should be defined in. Inspect the compilation/linker settings to find out why these files aren't compiled or linked.
If there are multiply defined symbols the linker usually mentions which object files or libraries contain them. Look at those files/their sources to find out why the offending functions/... are included in both of them.

How to make gcc or ld report undefined symbols but not fail?

If you compile a shared library with GCC and pass the "-z defs" flag (which I think just gets passed blindly on to ld) then you get a nice report of what symbols are not defined, and ld fails (no .so file is created). On the other hand, if you don't specify "-z defs" or explicitly specify "-z nodefs" (the default), then a .so will be produced even if symbols are missing, but you get no report of what symbols were missing if any.
I'd like both! I want the .so to be created, but I'd also like any missing symbols to be reported. The only way I know of to do this so far is to run it twice, once with "-z defs" and once without. This means the potentially long linking stage is done twice though, which will make the compile/test cycle even worse.
In case you're wondering my ultimate goal -- when compiling a library, undefined symbols in a local object file indicates a dependency wasn't specified that should have been in my build environment, whereas if a symbol is missing in a library that you're linking against that's not an error (-l flags are only given for immediate dependencies, not dependencies of dependencies, under this system). I need the report for the part where it lists "referenced in file" so I can see whether the symbol was referenced by a local object or a library being linked. The --allow-shlib-undefined option almost fixes this but it doesn't work when linking against static libraries.
Preference to solutions that will work with both the GNU and Solaris linkers.
Instead of making ld report the undefined symbols during linking, you could use nm on the resulting .so file. For example:
nm --dynamic --undefined-only foo.so
EDIT: Though I guess that doesn't give you which source files the symbols are used in. Sorry I missed that part of your question.
You could still use nm for an approximate solution, along with grep:
for sym in `nm --dynamic --undefined-only foo.so |cut -d' ' -f11 |c++filt -p` ; do
grep -o -e "\\<$sym\\>" *.cpp *.c *.h
done
This might have problems with local symbols of the same name, etc.
From the GNU ld 2.15 NEWS file:
Improved linker's handling of unresolved symbols. The switch
--unresolved-symbols= has been added to tell the linker when it
should report them and the switch --warn-unresolved-symbols has been added to
make reports be issued as warning messages rather than errors.