I want to identify unused object files in a large C application with many libraries. The project has grown a lot over time and now I want to search for libraries that are not used anymore, so that I can remove them from the dependency file. Is it possible with the gcc linker to identify any object that is not used?
For example, if I compile an application with gcc and let's say none of the symbols/functions of library2 are used. Is there any way to get the info about which objects are not linked in?
gcc library1.o library2.o main.o -o main.elf
I know that gcc has the compiler and linker flags to remove unused symbols:
-fdata-sections -ffunction-sections -Wl,--gc-sections
However this way I don't know which of the objects were removed by gcc. It would be perfect if gcc has an option to get a list of objects which were not linked into the application.
Just to mention: I need it on object file basis not on function/symbol basis!
Does anyone know such an option for gcc?
For example, if I compile an application with gcc and let's say none of the symbols/functions of library2 are used. Is there any way to get the info about which objects are not linked in?
gcc library1.o library2.o main.o -o main.elf
With above command, library2.o will be linked in even if none of the code from it is ever used. To understand why, read this or this.
It is true that if you compile code in library2.o with -ffunction-sections -fdata-sections and link with -Wl,-gc-sections, then all of the code and data from library2.o will be GC'd out, but that is not the command you gave.
Presumably, you are more interested in what happens if you use libraries as libraries:
gcc main.o -o main.elf -lrary1 -lrary2
In that case, if none of the code from library2 is referenced, the linker will not pull it into the link.
There is no way to ask the linker for list of things it didn't use, but (if you are using GNU-ld) there is a way to ask it for a list of objects it did use: the -M or -Map option. Once you know what objects are used, it's a simple matter of subtracting used objects from all objects used while linking to get the list that is not used.
Update:
Gold linker supports --print-symbol-counts FILENAME option, which can also be helpful here. It prints defined and used symbol counts. For library2.a, it should print $num_defined 0, the 0 indicating that none of the objects from library2.a were actually used.
Take a look at callcatcher
This compiles your program into assembly and extracts obvious references from the assembly output. I guess that is exactly what you are searching for. (Note due to the fact it analyzes assembler output it will only work on x86 platforms)
Note callcatcher ignores virtual functions (for some good reasons), so it will not directly allow you to analyse those.
Related
I'm using C++ dlopen() to link a shared library named as lib*.so (in directory A) in my main program (in directory B).
I experimented on some simple function loading. Every thing works very well. However, it gave me a headache when I was trying to load class and factory functions that return a pointer to the class object. (I'm using the terms from the tutorial below)
The methodology I used was based on the examples in chapter 3.3 of this tutorial https://www.tldp.org/HOWTO/C++-dlopen/thesolution.html#externC.
There is a bit of polymorphism here ... lib*.so contains a child class that inherits a parent abstract class from the main program directory (directory B). When dlopen() tries to load lib*.so in the main program, it failed due to "undefined symbol".
I used nm command to examine the symbol tables in lib*.so and main program binary. The symbols in these binaries are:
lib*.so : U _ZTI7ParentBox
main program binary: V _ZTI7ParentBox
ParentBox is the name of the parent class inherited by ChildBox in lib*.so. Note that parent class header file is in another project in directory B.
Although there is name mangling the symbol names are exactly the same.
I'm just wondering why the dynamic linker cannot link them? and giving me undefeind symbol error for dlopen()?
Am I missing the understanding of some key concepts here?
P.S. more strangely, it was able to resolve the symbols for member functions between the child class (U type symbol) in lib*.so (T type symbol) and parent class. Why is it able to do this but not able to resolve the undefined symbol for parent class name?
(I've been searching around for a long time and tried -rdynamic, -ldl stuff though I'm not fully understood what they are, but nothing worked)
Update 04 April 2019:
This is the g++ command line I used to make the main program binary.
g++ -fvisibility=hidden -pthread -static-libgcc -static-libstdc++ \
-m64 -fpic -ggdb3 -fno-var-tracking-assignments -std=c++14 \
-rdynamic \
-o ./build/main-prog \
/some_absolute_path/ParentBox.o \
/some_other_pathen/Triangle.o \
/some_other_pathen/Circle.o \
/some_other_pathen/<lots_of_depending_obj> \
/some_absolute_path/librandom.a \
-lz -ldl -lrt -lbz2
I searched every argument of this command line in https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html (This seems to be a good reference site for all fellow programmers working with large projects with complicated g++ line :) )
Thanks to #Employed Russian. With his instructions, the problem narrows down to export the symbols in main program binary.
However, the main program binary has lots of dependencies as you can see from the above command, Circle, Triangle and lots of other object files.
We also need to add "-rdynamic" to the compilation of Circle, Triangle and other dependency object files. Otherwise it does not work.
In my case, I added "-rdynamic" to all files in my project to export all symbols. Not sure about "-fvisibility=hidden" doing anything good. I removed all of them in my Makefile anyway... I know this is not the best way but I will worry about speed later when everything is functionally correct. :)
More Updates:
The correct solution is in #Employed Russian's update in the answer.
My previous solution happened to work because I also removed "-fvisibility=hidden". It is not necessary (and probably wrong) to add -rdynamic to all objects used in the final link.
Please refer to #Employed Russian's explanation which addresses the core issue.
Final Update:
For fellow programmers who are interested in how C/C++ program is executed and how library can be linked, here is a good reference web course (Life of Binary) by Xeno Kovah: http://opensecuritytraining.info/LifeOfBinaries.html
You can also find a playlist on youtube. Just search "Life of Binary"
Although there is name mangling the symbol names are exactly the same. I'm just wondering why the dynamic linker cannot link them?
Most likely explanation: the symbol is not exported from the main binary.
Repeat your command with nm -D:
nm -AD lib*.so main-prog | grep ' _ZTI7ParentBox$'
Chances are, you'll see lib*.so: U _ZTI7ParentBox and nothing from main-prog.
This happens because normally the linker will not export any symbol from main-prog, that is not referenced by some shared library participating in the link (and your lib*.so isn't linked with main-prog, or else you wouldn't need to dlopen it).
To change that behavior, you could add -Wl,--export-dynamic linker flag when linking main-prog. That instructs the linker to export everything that is linked into main-prog.
tried -rdynamic
That is equivalent to -Wl,--export-dynamic, and should have worked (assuming you added it to the main-prog link line, and not somewhere else).
Update:
Everything works now! Since main-prog also depends on some other objects, it appears that simply add -rdynamic to the final main-prog linking does not resolve the problem. We need to add "-rdynamic" to the compilation of those depending objects.
That is the wrong solution. Your problem is that -fvisibility=hidden tells the compiler to mark all symbols that go into main-prog as not exported, and -rdynamic doesn't export any hidden symbols.
The correct solution is to remove -fvisibility=hidden from any objects that define symbols you do want to export, and add -rdynamic to the final link.
I want to hide as much information as I can from ldd, so I'm learning how to statically link in libraries instead of dynamically linking them. I've read from another stackoverflow post that the correct syntax is
g++ -ldynamiclib -o exe files.cpp staticlib.a
Thus, my current compilation code looks like this:
STATIC_LIB=""
STATIC_LIB="$STATIC_LIB ${PATH}/libcrypto.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libdl-2.5.so" // I couldn't find the .a version for this, so I tried doing it this way, and have also tried doing just -ldl
STATIC_LIB="$STATIC_LIB ${PATH}/libstdc++.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libgcc.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libc.a"
g++ -g -I${INCLUDE_PATH} -o executable file1.cpp file2.cpp $STATIC_LIB
I've confirmed with ldd that this way works for libcrypto, as it is an external library that I brought in. However, this does not work at all for everything else, and I can still see them being listed when I use ldd. Does anyone knows the correct way of doing this?
P.S. I've also tried several other alternatives such as including -static, or using -Wl,-Bstatic, and I couldn't get either of those to work. Not sure if it's my syntax or if it's just not possible.
Those libraries libstdc++, libgcc and libc are special in that they're very fundamental to the running of any program compiled with gcc. Special gcc options exist if you want to link them statically, namely -static-libstdc++ and -static-libgcc.
Note that you should really know what you're doing if you choose these options. It can create portability problems for your program, many of which express themselves in unintuitive ways.
I have an executable which links to a big .a archive that contains lots of functions. The executable only uses a small fraction of the functions in this archive, but for some reason it pulls everything from it and ends up being very big.
My suspicion is that some of the functionality that the executable is using somehow references something it shouldn't and that causes everything else to be pulled.
Is it possible to make gcc tell me what reference causes a specific symbol to be added in the executable? Why else can this happen?
I've tried using --gc-sections with no effect.
I've tried using --version-script to make all the symbols in the executable local with no effect
I'm not interested in -ffunction-sections and -fdata-sections since it is while object files I want to discard, not functions.
Other answers mention -why_live but that seem to be implemented only for darwin and I am in linux x86_64
Use -Wl,-M to pass -M to the linker, causing it to print a link trace. This will show you the reasons (or at least the first-found reason) for every object file that gets linked from an archive.
I have an executable compiled with g++ that links in about 50 static libraries (on top of a bunch of system libraries). I'd like to know which methods in those libraries are being used, or even more important which methods will never be called.
Is there a tool and/or compiler flag that will provide this?
You can use nm tool in Linux\UNIX (at least when compiled with -g)
Since you are using static libs ONLY THE REFERENCED METHODS from the libraries will be added to your executable
usage like:
nm <your executable with debug info>
you can also try to read man page;
man nm
Not sure what you exactly mean but if you want to get the functions that are not referenced, there are some compiler options.
-ffunction-sections
would tell the compiler to place each function into its own section in the obj file.
then at link time --gc-sections and --print-gc-sections would do garbage collection of unused sections(functions) and also list the result.
You may want to build all your static libraries to have a complete list.
I'm attempting to design a shared library of shared libraries using g++ with hopes of simplifying my compile scripts and easing my update process in the future, but I'm still novice at best with GNU tools and writing libraries, at that. Can anyone provide advice on whether the following idea is possible with g++?
For convenience, consider the following file system layout:
main.cpp
libraryX/
libraryX/libX.so
libraryX/libraryY/
libraryX/libraryY/libY.so
libraryX/libraryZ/
libraryX/libraryZ/libZ.so
My goal is to be able to link indirectly using cascading relative paths. For instance, main.cpp links to libraryX/libX.so, which links to libraryY/libY.so and libraryZ/libZ.so. Is it possible to only link main.cpp to libX.so and use functions defined in libY.so and libZ.so?
If so, could you provide an example of the flags one would need to do so? I've been trying variations of the following command using various sources from Google to no avail:
g++ -shared -fPIC -Wl-rpath=libraryX -LlibraryX -lX.so main.o -o executable
Any guidance or references are greatly appreciated.
Don't do this (even if you can figure out how).
When you link against -lX, the static linker must know all other shared libraries that are "part of this link". Since -lY is not on the link line, the static linker will either give you an error, or it must somehow figure out where libY.so is coming from. For the latter, it has to replicate the RPATH search that the runtime loader will perform. This replication is error prone (the static linker may not use the exact same algorithm) and best avoided.
Finally, your command line is totally wrong: -shared means you ask the linker for a shared library, but you are clearly trying to link an executable. You generally should not use -fPIC when linking an executable. Also, -Wl-rpath=... should be -Wl,-rpath=... (the comma is important).