Say I have two .cpp files and in one of them I wrote
extern int i ;
and in another one I define the i variable.
Now how the linker knows that in the first file the i should be linked to the address of "i" in the second file? This question arises, because as I know, the object file does not have any info about variable names (it knows only addresses) (see this link).
I am really confused in this.
Some light reading: Beginner's Guide to Linkers.
The object code has symbol definitions in it. The linker uses these to resolve references to symbols. The symbols are not part of the executable code, and cannot be read by code that is contained within the object file (hence the answer to the question to link to).
The linked executable may also have symbols in it (e.g. for use by a debugger), or may have symbols removed at link stage (or later) since they are of no use to the code contained within the executable.
Related
I am working on an existing big C++ code base (more than 1 million line of code). I need to remove some part of the code deemed not useful. However, when I just exclude that part of code from the build process (i.e. not to compile them), eventually I got "undefined references" error in linking for some symbols (class function names) I removed.
A problem rose when I tried to find out where in other code have the references. Using Cscope or OpenGrok, I can find out a few explicit references but does not really help after removing such references. There are lots of other cases indirectly referring to the symbol I removed, for example:
virtual functions overridden in child class
"typedef" defined other symbol to refer to this missing symbol.
My question is: is there any gcc/g++ option I can turn on to have a output of all references (that gcc/g++ is aware of) direct or indirect to the symbol I removed?
If no such gcc/g++ option, is there any other tool that can produce such output?
Thanks.
Removing the compilation units (c or cpp files) from your project does not completely remove them. Those are typically just the definitions of functions and classes. The declarations of those functions and classes still exist in headers which are likely still being included in other compilation units.
Track down where these things are declared (typically in header files) and either comment them out in the headers or stop including the headers entirely if you don't need anything within them for your project.
For example:
If you are removing foo.c from a project, make sure any instance of #include "foo.h" has been removed from all other c/cpp files
You can instruct LD to emit a linker map containing a cross reference table using the flags -Map=path/to/my_mapfile.map and --cref. More info here:
https://sourceware.org/binutils/docs/ld/Options.html
The map file is very long and terse, but it usually has enough information to help you pinpoint exactly why a given symbol is still being referenced.
For Example
#include <iostream>
int add(int x, int y);
int main()
{
cout << add(5, 5) << endl;
}
This would compile but not link. I understand the problem, I just don't understand why it compiles fine but doesn't link.
Because the compiler doesn't know whether that function is provided by a library (or other translation unit). There's nothing in your prototype that tells the compiler the function is defined locally (you could use static for that).
The input to a C or C++ compiler is one translation unit - more or less one source code file. Once the compiler is finished with that one source code, it has done its job.
If you call/use a symbol, such as a function, which is not part of that translation unit, the compiler assumes it's defined somewhere else.
Later on, you link together all the object files and possibly the libraries you want to use, all references are tied together - it's only at this point, when pulling together everything that's supposed to create an executable, one can know that something is missing.
When a compiler compiles, it generates the output (object file) with the table of defined symbols (T) and undefined symbols (U) (see man page of nm). Hence there is no requirement that all the references are defined in every translation unit. When all the object files are linked (with any libraries etc), the final binary should have all the symbols defined (unless the target in itself is a library). This is the job of the linker. Hence based on the requested target type (library or not), the linker might not or might give an error for undefined functions. Even if the target is a non-library, if it is not statically linked, it still might refer to shared libraries (.so or .dll), hence if on the target machine while the binary is run, if the shared libraries are missing or if any symbols missing, you might even get a linker error. Hence between compiler, linker and loader, every one is trying to best provide you with the definition of every symbol needed. Here by giving declaring add, you are pacifying the compiler, which hopes that the linker or loader would do the required job. Since you didnt pacify the linker (by say providing it with a shared library reference), it stops and cribs. If you have even pacified the linker, you would have got the error in the loader.
I found one question about compiling and linking in C++ and I don't know which answer is correct. It was discussed with my friends and opinions are divided. Here is a question:
In order to run program written in C++ language its source code is:
(A) compiled to machine code,
(B) compiled and linked to machine code
In my opinion the correct answer is A but I don't have any source to prove it.
Google, first hit.
Linkage is needed as well to create a standalone executable.
You need to link the code you have produced to make it into an executable file. For simple programs, the compiler does this for you, by calling the linker at the end of the compilation process.
The compiler proper simply translates C code to either assembler (classic C compiler) which is then assembled with an assembler or directly to machine code (many modern compilers). The machine code is usually produced as "object files", which are not "executable", because they refer to external units - such as when you call printf(). It is possible to write C code that is completely standalone, but you still typically need to combine more than one object file, and it certainly needs to be "formatted" to the right way to make an executable file - which is a different file-format than an object file [although typically fairly SIMILAR].
Compilation does nothing except creation of object files which means converting C/C++ source code to machine codes.
Linking process is the creation of executable file from multiple obj files. So for running an application/executable you have to also link it.
During compilation, compiler doesn't complain about non existing functions or broken functions, because it will assume it might be defined in another object (source code file). Linker verifies all functions and their existance, so if you have a broken function, you'll get error in linking process
Compiling: Takes input C/C++-code and produces machinecode (object file)
gcc –c MyProgram.c
Note that the object file does not contain all external references!
Linking: Combines object file with external references into an executable file
gcc MyProgram.o –o MyProgram
Note that no unresolved references!
Illustration:
Where libc.a is the standard C library and it's automatically linked into your programs by the gcc.
I've just noticed that your question was about c++, the same concept is in c++ too, if you understand this, you'll understand how it works in c++ too
strictly speaking. Answer A.
But for you to see the whole picture, lets say you have defined some function. Then the compiler writes the machine code code of that function at some address, and puts that address and the name of the function in the object ".o" file where the linker can find it. The linker then take this "machine code" and resolve the symbols as you might heard in some previous error.
I have an ARM project that I'm building with make. I'm creating the list of object files to link based on the names of all of the .c and .cpp files in my source directory. However, I would like to exclude objects from being linked if they are never used. Will the linker exclude these objects from the .elf file automatically even if I include them in the list of objects to link? If not, is there a way to generate a list of only the objects that need to be linked?
You have to compile your code differently to strip out function and data that isn't used. Usually all the objects are compiled into the same symbol, so they can't be individually omitted if they're not used.
Add the two following switches to your compiler line:
-ffunction-sections -fdata-sections
When you compile, the compiler will now put individual functions and data into their own sections instead of lumping them all in one module section.
Then, in your linker, specify the following:
--gc-sections
This instructs the linker to remove unused sections ("gc" is for garbage collection). It will garbage collect parts of files and entire files. For example, if you're compiling an object, but only use 1 function of 100 in the object, it will toss out the other 99 you're not using.
If you run into issues with functions not found (it happens due to various reasons like externs between libraries), you can use .keep directives in your linker file (*.ld) in order to prevent garbage collection on those individual functions.
If you are using RealView, it seems that it is possible. This section discusses it:
3.3.3 Unused section elimination
Unused section elimination removes code that is never executed, or data that is not
referred to by the code, from the final image. This optimization can be controlled by the
--remove, --no_remove, --first, --last, and --keep linker options. Use the --info unused
linker option to instruct the linker to generate a list of the unused sections that have been
eliminated.
Like many people said, the answer is "depends". In my experience, RVCT is very good about dead code stripping. Unused code and data will almost always be removed in the final link stage. GCC, on the other hand (at least without the LLVM back end), is rather poor at whole image static analysis and will not do a very good job at removing unused code (and woe be it to you if your code is in different sections requiring long jumps). You can take some steps to mitigate it, such as using function-sections, which creates a separate section for each function and enables some better dead code stripping.
Have your linker generate a map file of your binary so you can see what made it in there and what got stripped out.
Depending on the sophistication of the compiler/linker and optimization level, the linker will not link in code that isn't being called.
What compiler/linker are you using? Some linkers do this automatically, and some provide the feature as a command-line option.
In my experience, many compilers will not include unused code on an object file basis. Some may not have this resolution and will include entire libraries ("because this makes the build process faster").
For example, given a file junk.c and it has three functions: Func1, Func2 and Func3. The build process creates an object file, junk.o, which has all three functions in it. If function Func2 is not used, it will be included anyway because the linker can't exclude one function out of an object file.
On the other hand, given files: Func1.c, Func2.c, and Func3.c, with the functions above, one per file. If Func2 in Func2.c is not used, the linker will not include it.
Some linkers are intelligent enough to exclude files out of libraries. However, each linker is different on its granularity of file inclusion (and thus file exclusion). Read your linker's manual or contact their customer support for exact information.
I suggest moving the suspect functions into a separate file (one function per file) and rebuild. Measure the code size before and after. Also, there may be a difference between Debug and Release linking. The Debug linking could be lazy and just throw everything in while the Release linking puts more effort into removing unused code.
Just my thoughts and experience, Your Mileage May Vary (YMMV).
Traditionally linkers link in all object files that are explicity specified in the command line, even if they could be left out and the program would not have any unresolved symbols. This means that you can deliberately change the behaviour of a program by including an object file that does something triggered from static initialization but is not called directly or indirectly from main.
Typically if you place most of your object files in a static library and link this library with a single object file containing your entry point the linker will only pick out members of the library (iteratively) that help resolve an unresolved symbol reference in the original object file or one included subsequently because it resolved a previous unresolved symbol.
In short, place most of your object files in a library and just link this with one object containing your entry point.
I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries