What methods does my c++ executable actually use - c++

I have an executable compiled with g++ that links in about 50 static libraries (on top of a bunch of system libraries). I'd like to know which methods in those libraries are being used, or even more important which methods will never be called.
Is there a tool and/or compiler flag that will provide this?

You can use nm tool in Linux\UNIX (at least when compiled with -g)
Since you are using static libs ONLY THE REFERENCED METHODS from the libraries will be added to your executable
usage like:
nm <your executable with debug info>
you can also try to read man page;
man nm

Not sure what you exactly mean but if you want to get the functions that are not referenced, there are some compiler options.
-ffunction-sections
would tell the compiler to place each function into its own section in the obj file.
then at link time --gc-sections and --print-gc-sections would do garbage collection of unused sections(functions) and also list the result.
You may want to build all your static libraries to have a complete list.

Related

gcc linker get list of unused objects

I want to identify unused object files in a large C application with many libraries. The project has grown a lot over time and now I want to search for libraries that are not used anymore, so that I can remove them from the dependency file. Is it possible with the gcc linker to identify any object that is not used?
For example, if I compile an application with gcc and let's say none of the symbols/functions of library2 are used. Is there any way to get the info about which objects are not linked in?
gcc library1.o library2.o main.o -o main.elf
I know that gcc has the compiler and linker flags to remove unused symbols:
-fdata-sections -ffunction-sections -Wl,--gc-sections
However this way I don't know which of the objects were removed by gcc. It would be perfect if gcc has an option to get a list of objects which were not linked into the application.
Just to mention: I need it on object file basis not on function/symbol basis!
Does anyone know such an option for gcc?
For example, if I compile an application with gcc and let's say none of the symbols/functions of library2 are used. Is there any way to get the info about which objects are not linked in?
gcc library1.o library2.o main.o -o main.elf
With above command, library2.o will be linked in even if none of the code from it is ever used. To understand why, read this or this.
It is true that if you compile code in library2.o with -ffunction-sections -fdata-sections and link with -Wl,-gc-sections, then all of the code and data from library2.o will be GC'd out, but that is not the command you gave.
Presumably, you are more interested in what happens if you use libraries as libraries:
gcc main.o -o main.elf -lrary1 -lrary2
In that case, if none of the code from library2 is referenced, the linker will not pull it into the link.
There is no way to ask the linker for list of things it didn't use, but (if you are using GNU-ld) there is a way to ask it for a list of objects it did use: the -M or -Map option. Once you know what objects are used, it's a simple matter of subtracting used objects from all objects used while linking to get the list that is not used.
Update:
Gold linker supports --print-symbol-counts FILENAME option, which can also be helpful here. It prints defined and used symbol counts. For library2.a, it should print $num_defined 0, the 0 indicating that none of the objects from library2.a were actually used.
Take a look at callcatcher
This compiles your program into assembly and extracts obvious references from the assembly output. I guess that is exactly what you are searching for. (Note due to the fact it analyzes assembler output it will only work on x86 platforms)
Note callcatcher ignores virtual functions (for some good reasons), so it will not directly allow you to analyse those.

How dynamic linking works, its usage and how and why you would make a dylib

I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.

When look up symbol, the program doesn't search from the correct library

I'm adding two classes and libraries to a system, parent.so and child.so deriving from it.
The problem is when the program is loading child.so it cannot find parent's virtual function's definition from parent.so.
What happens,
nm -D child.so will gives something like (I just changed the names)
U _ZN12PARENT15virtualFunctionEv
The program will crash running
_handle = dlopen(filename, RTLD_NOW|RTLD_GLOBAL); //filename is child.so
it'll give an error with LD_DEBUG = libs
symbol lookup error: undefined symbol: _ZN12PARENT15virtualFunctionEv (fatal)
The thing I cannot explain is, I tried LD_DEBUG = symbols using GDB, when running dlopen, the log shows it tried to look up basically in all libaries in the system except parent.so, where the symbol is defined. But from libs log parent.so is already loaded and code is run, and it is at the same path of all other libraries.
......
27510: symbol=_ZN12PARENT15virtualFunctionEv; lookup in file=/lib/tls/libm.so.6
27510: symbol=_ZN12PARENT15virtualFunctionEv; lookup in file=/lib/tls/libc.so.6
27510: symbol=_ZN12PARENT15virtualFunctionEv; lookup in file=/lib/ld-linux.so.2
27510: child.so: error: symbol lookup error: undefined symbol: _ZN12PARENT15virtualFunctionEv(fatal)
How the program or system is managing which library to look for a symbol's definition?
I'm new to Linux, can anybody point me some directions to work on?
Thanks.
EDIT
The command used to generate parent.so file is
c++ -shared -o parent.so parent.o
Similar for child.so. Is any information missing for linking here? Looks like child is only including parent's header file.
EDIT2
After another test, calling
_handle = dlopen("parent.so", RTLD_NOW|RTLD_GLOBAL);
before the crashing line will solve the problem, which I think means originally parent.so was not loaded. But I'm still not very clear about the cause.
You need to tell the linker that your library libchild.so uses functionality in libparent.so. You do this when you are creating the child library:
g++ -shared -o libchild.so child_file1.o child_file2.o -Lparent_directory -lparent
Note that order is important. Specify the -lparent after all of your object files. You might also need to pass additional options to the linker via the -Wl option to g++.
That still might not be good enough. You might need to add the library that contains libparent.so to the LD_LIBRARY_PATH environment variable.
A couple of gotchas: If you aren't naming those libraries with a lib prefix you will confuse the linker big time. If you aren't compiling your source files with either -fPIC or -fpic you will not have relocatable objects.
Addendum
There's a big potential problem with libraries that depend on other libraries. Suppose you use version 1.5 of the parent package when your compile your child library source files. You manage to get past all of the library dependencies problems. You've specified that your libchild.so depends on libparent.so. Your stuff just works. That is until version 2.0 of the parent package comes out. Now your stuff breaks everywhere it's used, and you haven't changed one line of code.
The way to overcome this problem is to specify at the time you build your child library that the resultant shared library depends specifically on version 1.5 of libparent.so`.
To do this you will need to pass options from g++/gcc to the linker via the -Wl option. Use -Wl,<linker_option>,<linker_option>,... If those linker options need spaces you'll need to backslash-escape them in the command to g++. A couple of key options are -rpath and -soname. For example, -rpath=/path/to/lib,-soname=libparent.so.1.5.
Note very well: You need to use the -soname=libparent.so.1.5 option when you are building libparent.so. This is what lets the system denote that your libchild.so (version 1.0) depends on libparent.so (version 1.5). And you don't build libparent.so. You build libparent.so.1.5. What about libparent.so? That needs to exist to, but it should be a symbolic link to some numbered numbered version (preferably the most recent version) of libparent.so.
Now suppose non-backward compatible parent version 2.0 is compiled and built into a shiny new libparent.so.2.0 and libparent.so is symbolically linked to this shiny new version. An application that uses your clunky old libchild.so (version 1.0) will happily use the clunky old version of libparent.so instead of the shiny new one that breaks everything.
It looks like you're not telling the linker that child.so needs parent.so, use something like the following:
g++ -shared -o libparent.so parent.o
g++ -shared -o libchild.so -lparent child.o
When you build your main program, you have to tell the compiler that it links with those libraries; that way, when it starts, linux will load them for it.
Change their names to libparent.so and libchild.so.
Then compile with something like this:
g++ <your files and flags> -L<folder where the .so's are> -lparent -lchild
EDIT:
Maybe it would be a smaller change to try loading parent.so before child.so. Did you try that already?

Compile a Standalone Static Executable

I'm trying to compile an executable (ELF file) that does not use a dynamic loader. I built a cross compiler that compiles mips from linux to be used on a simulator I made. I asserted the flag -static-libgcc on compilation of my hello.cpp file (hello world program). Apparently this is not enough though. Because there is still a segment in my executable which contains the name/path of the dynamic loader. What flags do I use to generate an executable which contains EVERYTHING needed to be run? Do I need to rebuild my cross compiler?
Use the following flags for linking
-static -static-libgcc -static-libstdc++
Use these three flags to link against the static versions of all dependencies (assuming gcc). Note, that in certain situation you don't necessarily need all three flags, but they don't "hurt" either. Therefore just turn on all three.
Check if it actually worked
Make sure that there is really no dynamic linkage
ldd yourexecutable
should return "not a dynamic executable" or something equivalent.
Make sure that there are no unresolved symbols left
nm yourexecutable | grep " U "
The list should be empty or should contain only some special kernel-space symbols like
U __tls_get_addr
Finally, check if you can actually execute your executable
Try using the -static flag?

Shared libraries and .h files

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.