Why is partial linking making all symbols non-global? - c++

I'm trying to reduce the disk size of a big static library libIn.a that contains a lot (~3000) of small object files (.o), by combining all the object files into a single .o file; as far as I understand, this procedure is called "partial linking". The size reduction would be achieved by collapsing all the object sections (one per .o file) into a single one.
The problem I'm seeing is that the procedure I'm using does not preserve global-ness of the symbols included in the .o files included in libIn.a i.e., all symbols become local after the partial linking, and that causes "undefined symbol" errors downstream in the linking process.
As an example, that's how version.o object file (originally included in libIn.a, you can download version.o at https://github.com/giovanniberi93/problematic_object_file) looks like before performing partial linking:
└─ nm -Ca version.o
0000000000000000 T webrtc::LoadWebRTCVersionInRegister()
0000000000000024 s l_.str
0000000000000000 t ltmp0
0000000000000024 s ltmp1
So now the symbol webrtc::LoadWebRTCVersionInRegister() is global (T).
But when performing partial linking, the same symbol becomes local (t):
└─ ld -r version.o -o why_is_local.o
└─ nm -Ca why_is_local.o
0000000000000024 s LC1
0000000000000000 t webrtc::LoadWebRTCVersionInRegister()
Things get even weirder: when trying to replicate the same scenario with a sample .o file, the global symbols are not converted into local symbols (!); e.g., with input C++ file:
int function1() {
return 1;
}
Its global symbol function1() is not converted into local symbol by performing partial linkage i.e., it stays global (T) before and after partial linking:
└─ clang -c file1.cc
└─ nm -Ca file1.o
0000000000000000 T function1()
0000000000000000 t ltmp0
0000000000000008 s ltmp1
└─ ld -r file1.o -o relocated.o
└─ nm -Ca relocated.o
0000000000000000 T function1()
There must be some difference in version.o and file1.o that is causing the global symbols to become local, but I've not been able to pinpoint it. Any input would be greatly appreciated.
My env (MacOS 12.6, arm64):
└─ clang -v
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Users/giober/Desktop/XCodes/13.1/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
└─ ld -v
#(#)PROGRAM:ld PROJECT:ld64-711
BUILD 18:11:19 Aug 3 2021
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 13.0.0, (clang-1300.0.29.3) (static support for 27, runtime is 27)
TAPI support using: Apple TAPI version 13.0.0 (tapi-1300.0.6.5)
└─ nm --version
Apple LLVM version 13.0.0 (clang-1300.0.29.3)
Optimized build.
Default target: arm64-apple-darwin21.6.0
Host CPU: vortex

The symbol in question is marked as "private external":
% nm -m version.o
0000000000000000 (__TEXT,__text) private external __ZN6webrtc27LoadWebRTCVersionInRegisterEv
0000000000000024 (__TEXT,__cstring) non-external l_.str
0000000000000000 (__TEXT,__text) non-external ltmp0
0000000000000024 (__TEXT,__cstring) non-external ltmp1
Not sure where that's coming from, but you can preserve this state with -keep_private_externs:
ld -keep_private_externs -r version.o -o output.o

Related

Static Library Linking Issue "Undefined symbols" for symbols that are defined

I am using Apple LLVM version 8.0.0 (clang-800.0.42.1) to compile. It's about 1200 files, but I have used them before. I go and compile them all, no problems. Then I make my static library (ar rcs libblib.a *.o), no problems. So when I try to use my brand new library, I have my problem.
gcc main.c -L. -lblib
Undefined symbols for architecture x86_64:
"_N_method", referenced from:
_main in main-7fc584.o
ld: symbol(s) not found for architecture x86_64
But, I know this is defined. I check to see that the file is included (ar -t libblib.a | grep N_METHOD.o) and it is in there. Check the source file, and there is the method, exactly named as it is in the header file. What is the problem I am having here? I am at a complete loss and I am hoping I am missing something simple.
I did nm -g N_METHOD.o and got back:
0000000000000000 T __Z8N_methodP6stacks
Transferring comments into an answer.
Based on the question content, I asked:
Have you checked that N_METHOD.o is a 64-bit object file (or a fat object file with both 32-bit and 64-bit code in it)? If it is a 32-bit object file, then it is no use for a 64-bit program. However, that's a little unlikely; you have to go out of your way to create a 32-bit object file on Mac.
Have you run nm -g N_METHOD.o to see whether _N_method is defined in the object file?
I did nm -g N_METHOD.o and got back:
0000000000000000 T __Z8N_methodP6stacks
Don't compile C code with a C++ compiler. Or don't try to compile C++ code with a C compiler. The mangled name (__Z8N_methodP6stacks) is for C++. Maybe you simply need to link with g++ instead of gcc? They are different languages — this is the property of 'type-safe linkage' that is characteristic of C++ and completely unknown to C.
First step — compile and link with:
g++ main.c -L. -lblib
Assuming that the source is in the C++ subset of C (or C subset of C++), then the chances are that should work. At least, if the code contains N_Method(&xyz) where xyz is a variable of type stacks, then there's a chance it will call __Z8N_methodP6stacks.
The following code:
typedef struct stacks stacks;
extern int N_method(stacks*);
extern int relay(stacks *r);
int relay(stacks *r) { return N_method(r); }
compiles with a C++ compiler to produce the nm -g output:
0000000000000000 T __Z5relayP6stacks
U __Z8N_methodP6stacks
It also compiles with a C compiler to produce the nm -g output:
0000000000000038 s EH_frame1
U _N_method
0000000000000000 T _relay

ld fails to link a static library to a dynamic library even when all files are compiled with fPIC

On a CentOS 7 x64 system, I've build the latest Boost 1.61.0 with -fPIC enabled. I'm trying to link libboost_log.a to the dynamic library I'm building so that the user of my library doesn't have to have Boost installed. That succeeded using the stock GCC 4.8.5 shipped with CentOS 7, but failed when I'm using GCC 5.2.1 from devtoolset-4.
Here's the error:
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.2.1/ld: /opt/boost/lib/libboost_log.a(attribute_name.o): relocation R_X86_64_32 against `_ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43' can not be used when making a shared object; recompile with -fPIC
/opt/boost/lib/libboost_log.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
How I built Boost:
./b2 -j6 -q -d+2 cxxflags=-fPIC cflags=-fPIC variant=release
Boost uses -O3 by default. My program thus used -O3 as well.
Command for building my library:
/opt/rh/devtoolset-4/root/usr/bin/c++ -fPIC -O3 -g -DNDEBUG -shared -Wl,-soname,libfoobar.so.0 -o libfoobar.so.0.5 foobar.cc.o -L/opt/boost/lib /opt/boost/lib/libboost_filesystem.a /opt/boost/lib/libboost_log.a /opt/boost/lib/libboost_program_options.a -lpthread -Wl,-rpath,/opt/boost/lib
Proof that libboost_log.a is built with fPIC:
$ objdump -r /opt/boost/lib/libboost_log.a | grep _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
0000000000000002 R_X86_64_32 _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
0000000000000011 R_X86_64_32S _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
00000000000001a3 R_X86_64_32 _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
00000000000001be R_X86_64_32S _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
000000000000001c R_X86_64_32S _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
0000000000000021 R_X86_64_32 _ZZN5boost3log12v2s_mt_posix3aux14lazy_singletonINS1_14attribute_name10repositoryENS_10shared_ptrIS5_EEE3getEvE29_boost_log_once_block_flag_43
Thoughts:
As you can see libboost_filesystem.a seems linked fine, just libboost_log.a cannot be linked in. What things can I check now? Any hints are welcome. Thanks!
The compiler used /opt/rh/devtoolset-4/root/usr/bin/c++ is other than /usr/bin/c++. If those two compilers have different versions of C++ headers they may produce libraries that have different symbols for methods (name mangling). There may be also other reasons why linking may fail if the compiler versions are different.
I would recommend compiling C++ code so that at least the major version of gcc is the same when compiling each C++ library.

Extract and link a 64bit gnu c++ lib from a 64bit mscv++ lib

I have built SpiderMonkey on Windows. They provide MSVC++ toolchain and I couldn't build it for mingw. I've built it for 64bit.
It is a DLL, I need to convert its lib to gnu C++ format (.lib to .a).
After looking on the web, I've found here how to do this, roughly:
gendef mozjs-45.dll
dlltool --as-flags=--64 -m i386:x86-64 -k --output-lib mozjs-45.a --input-def mozjs-45.def
I use TDM-GCC-64 under Code::Blocks. At link time it throws errors like:
undefined reference to `__imp__Z13JS_GetPrivateP8JSObject'
I have checked the lib content using:
nm libmozjs-45.a > libmozjs-45.nm
I see there are the same entries as in the def file exported, but that looks different than linker expects (I presume):
?JS_GetPrivate##YAPEAXPEAVJSObject###Z
Edit 1
I have managed to build SpiderMonkey with mingw-w64. Now, at linking time I get the following error:
undefined reference to `__imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObject'
Looking with nm at the lib, I have:
d000536.o:
0000000000000000 i .idata$4
0000000000000000 i .idata$5
0000000000000000 i .idata$6
0000000000000000 i .idata$7
0000000000000000 t .text
0000000000000000 I __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE
U _head_mozjs_45_dll
0000000000000000 T _ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE
Indeed, the definition of the class is:
class MOZ_RAII JS_PUBLIC_API(JSAutoCompartment)
{
JSContext* cx_;
JSCompartment* oldCompartment_;
public:
JSAutoCompartment(JSContext* cx, JSObject* target
MOZ_GUARD_OBJECT_NOTIFIER_PARAM);
JSAutoCompartment(JSContext* cx, JSScript* target
MOZ_GUARD_OBJECT_NOTIFIER_PARAM);
~JSAutoCompartment();
MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER
};
Why the same compiler exports this as __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE, but, when referencing it, expects it as __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObject?
Answer: missed a symbol definition that exclude MOZ_GUARD_OBJECT_NOTIFIER_PARAM from build.

building the ta-lib library fails with undefined references from libm.so

Trying to make the ta-lib library (ta-lib-0.4.0-src.tar.gz) I get the following error:
/home/me/ta-lib/src/.libs/libta_lib.so: undefined reference to `sinh'
/home/me/ta-lib/src/.libs/libta_lib.so: undefined reference to `sincos'
/home/me/ta-lib/src/.libs/libta_lib.so: undefined reference to `ceil'
...
for a large number of maths functions.
The failing command looks like this:
gcc -g -O2 -o .libs/ta_regtest (... .o files) -L/home/me/ta-lib/src \
/home/me/ta-lib/src/.libs/libta_lib.so -lm -lpthread -ldl
The offending library (ta_lib) looks like this:
objdump -TC libta_lib.so | grep " D \*UND\*"
0000000000000000 D *UND* 0000000000000000 sinh
0000000000000000 D *UND* 0000000000000000 sincos
0000000000000000 D *UND* 0000000000000000 ceil
...
For the same maths functions (the grep excludes defined functions and those that have a "w" (presumably weak) flag)
A map lists the libraries included, among them:
LOAD /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libm.so
and a list of the symbols (objdump -TC) defined in libm.so includes:
000000000001a320 w iD .text 0000000000000020 GLIBC_2.2.5 ceil
which was one of the undefined references (they are all there). I cannot determine the meaning of GLIBC_2.2.5.
Why is the loader not finding these functions?
My system looks like this:
$ uname -a
Linux mynode 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

no debugging symbols found when using gdb

GNU gdb Fedora (6.8-37.el5)
Kernal 2.6.18-164.el5
I am trying to debug my application. However, everytime I pass the binary to the gdb it says:
(no debugging symbols found)
Here is the file output of the binary, and as you can see it is not stripped:
vid: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped
I am compiling with the following CFLAGS:
CFLAGS = -Wall -Wextra -ggdb -O0 -Wunreachable-code
Can anyone tell me if I am missing some simple here?
The most frequent cause of "no debugging symbols found" when -g is present is that there is some "stray" -s or -S argument somewhere on the link line.
From man ld:
-s
--strip-all
Omit all symbol information from the output file.
-S
--strip-debug
Omit debugger symbol information (but not all symbols) from the output file.
The application has to be both compiled and linked with -g option. I.e. you need to put -g in both CPPFLAGS and LDFLAGS.
Some Linux distributions don't use the gdb style debugging symbols. (IIRC they prefer dwarf2.)
In general, gcc and gdb will be in sync as to what kind of debugging symbols they use, and forcing a particular style will just cause problems; unless you know that you need something else, use just -g.
You should also try -ggdb instead of -g if you're compiling for Android!
Replace -ggdb with -g and make sure you aren't stripping the binary with the strip command.
I know this was answered a long time ago, but I've recently spent hours trying to solve a similar problem. The setup is local PC running Debian 8 using Eclipse CDT Neon.2, remote ARM7 board (Olimex) running Debian 7. Tool chain is Linaro 4.9 using gdbserver on the remote board and the Linaro GDB on the local PC. My issue was that the debug session would start and the program would execute, but breakpoints did not work and when manually paused "no source could be found" would result. My compile line options (Linaro gcc) included -ggdb -O0 as many have suggested but still the same problem. Ultimately I tried gdb proper on the remote board and it complained of no symbols. The curious thing was that 'file' reported debug not stripped on the target executable.
I ultimately solved the problem by adding -g to the linker options. I won't claim to fully understand why this helped, but I wanted to pass this on for others just in case it helps. In this case Linux did indeed need -g on the linker options.
Hope the sytem you compiled on and the system you are debugging on have the same architecture. I ran into an issue where debugging symbols of 32 bit binary refused to load up on my 64 bit machine. Switching to a 32 bit system worked for me.
Bazel can strip binaries by default without warning, if that's your build manager. I had to add --strip=never to my bazel build command to get gdb to work, --compilation_mode=dbg may also work.
$ bazel build -s :mithral_wrapped
...
#even with -s option, no '-s' was printed in gcc command
...
$ file bazel-bin/mithral_wrapped.so
../cpp/bazel-bin/mithral_wrapped.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=4528622fb089b579627507876ff14991179a1138, not stripped
$ objdump -h bazel-bin/mithral_wrapped.so | grep debug
$ bazel build -s :mithral_wrapped --strip=never
...
$ file bazel-bin/mithral_wrapped.so
bazel-bin/mithral_wrapped.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=28bd192b145477c2a7d9b058f1e722a29e92a545, not stripped
$ objdump -h bazel-bin/mithral_wrapped.so | grep debug
30 .debug_info 002c8e0e 0000000000000000 0000000000000000 0006b11e 2**0
31 .debug_abbrev 000030f6 0000000000000000 0000000000000000 00333f2c 2**0
32 .debug_loc 0013cfc3 0000000000000000 0000000000000000 00337022 2**0
33 .debug_aranges 00002950 0000000000000000 0000000000000000 00473fe5 2**0
34 .debug_ranges 00011c80 0000000000000000 0000000000000000 00476935 2**0
35 .debug_line 0001e523 0000000000000000 0000000000000000 004885b5 2**0
36 .debug_str 0033dd10 0000000000000000 0000000000000000 004a6ad8 2**0
For those that came here with this question and who are using Qt: in the release config there is a step where the binary is stripped as part of doing the make install. You can pass the configuration option CONFIG+=nostrip to tell it not to:
Instead of:
qmake <your options here, e.g. CONFIG=whatever>
you add CONFIG+=nostrip, so:
qmake <your options here, e.g. CONFIG=whatever> CONFIG+=nostrip
The solutions I've seen so far are good:
must compile with the -g debugging flag to tell the compiler to generate debugging symbols
make sure there is no stray -s in the compiler flags, which strips the output of all symbols.
Just adding on here, since the solution that worked for me wasn't listed anywhere. The order of the compiler flags matters. I was including multiple header files from many locations (-I/usr/local/include -Iutil -I. And I was compiling with all warnings on (-Wall).
The correct recipe for me was:
gcc -I/usr/local/include -Iutil -I -Wall -g -c main.c -o main.o
Notice:
include flags are at the beginning
-Wall is after include flags and before -g
-g is at the end
Any other ordering of the flags would cause no debug symbols to be generated.
I'm using gcc version 11.3.0 on Ubuntu 22.04 on WSL2.