redefinition happend in which procedure? compile or link time? - c++

If I defined a function twice, I'll get a redefinition error message, but
I'm confused that redefinition happened in compile or link time?
and why you can override malloc in libc without a redefinition error?

You get a function redefinition error when you have two function with the same prototype or signature (function signature is made of the function name number of parameters and parameter types, does NOT include the return type).
This is a compile time error if the compiler see two functions with the same signature:
int foo(int a);
double foo(int b);
Why you can override function calls in libraries? Let's look at how the code is build into an executable:
the compiler is called for each source file and outputs an object file: any function call which cannot be resolved (i.e. calling a function in a different file) is an external symbol which the linker will have to resolve.
the linker take all the object files and tries to resolve all the symbols; but it does this on a first come first served manner. For a external symbol it will consider the first definition it finds and not worry about the fact that there may be more definitions of the same symbol available.
So, the linker actually allows you to override a function's behavior. And it all depends on the order the files are linked - the first function definition it finds is the one used to resolve the symbol.
Hope this sheds some light on the matter.

Either or both. It can also result from the programmer editing source code or modifying build scripts.
"Redefinition" errors are emitted when the linker find two things (symbols) with the same name.
There are many reasons the linker might find two symbols with the same name. Some possibilities (there are many permutations) include;
An object file is specified twice in the link command. This usually results from an error in a build script.
Two object files that contain the same function definition. This results from code duplication - for example, a function definition being copied into different source files, which are then compiled and linked. It can also result from monkey business with the preprocessor (e.g. #includeing a file that contains the definition of a global variable by two source files).
The causes of things like the above are generally programmer error (e.g. supplying a bad linker command in a build script, misuse of the preprocessor, copying and pasting code between projects.
The reason functions in libraries like libc can often be "overridden" is that the linker typically only looks for symbols in libraries if it can't find them in object files. So, if an object file defines malloc() the linker will resolve all calls to that, and not attempt to resolve using the version in a library. This sort of thing is quite dangerous, because some other functions within libraries (e.g. even within libc) may resolve directly to the original malloc() (e.g. some calls may be inlined) which can unpredictable behaviours. This sort of behaviour is also linker specific: although this sort of thing is common with unix/linux variants, there are systems where the linkers do things differently.

Related

How does a function caller use a header file to determine what to do with a compiled binary?

My understanding is that C++ (and C, I guess) header files are never compiled, and simply act as an explanation of the interface of the C++ file they describe.
So if my header file describes a hello() function, some program that includes the header will know about hello() and how to call it and what arguments to give it, etc.
However, after compilation (and before linking, I guess? I'm not sure), when the hello.c file is binary machine code, and hello.h is still C++, how does the compiler/linker know how to call a function in the binary blob based on the presence of its declaration in the header file?
I understand concepts such as symbol tables, abstract syntax trees, etc (i.e., I have taken a compiler class in the past), but this is a gap in my knowledge).
The implementation of hello() assumes a certain calling convention (where are the parameters on the stack, who cleans up the stack the caller or the callee, etc).
The compiler generates code with the correct calling convention. It may use information from the header file to do this (e.g. the function is marked __stdcall in Windows program) or it may use it's default calling convention. The compiler will also use the header file to make sure your are calling the routine with the right number and types of parameters. Once the code is generated by the compiler the header file is not used again.
The linker is not concerned with calling convention it's primary responsibility is to patch together the binaries you've compiled by fixing up references among your modules and any libraries it calls.
A C/C++ compilation unit (cpp file / c file) includes all the header files (as text) and the code.
The header file helps explain how to produce the call instruction
push arg1
push arg2
call _some_function
If the compilation unit includes _some_function then this will be resolved at compile time.
Otherwise it becomes an undefined symbol. If so, when the linker comes along, it looks through all the object files and libraries to resolve all the undefined symbols.
So the header file helps code the assembly correctly.
Object and library files provide implementations.
The library files are optional. When a linker looks in a library file, it only gets added if it satisfies some symbol, otherwise it is not added to the binary.
Object files (ignoring optimization) will get added to the binary completely.
Building a C++ program is a two-step process: compile and link.
The header is for compilation of the module you are writing. The binary is for linking: it contains the compiled code for the method defined in the correspnding header. The header has to match what's already been compiled. At link time you will learn if your header has a method signature that matches what was compiled in the binary.

How can I find out the references to a C++ symbol

I am working on an existing big C++ code base (more than 1 million line of code). I need to remove some part of the code deemed not useful. However, when I just exclude that part of code from the build process (i.e. not to compile them), eventually I got "undefined references" error in linking for some symbols (class function names) I removed.
A problem rose when I tried to find out where in other code have the references. Using Cscope or OpenGrok, I can find out a few explicit references but does not really help after removing such references. There are lots of other cases indirectly referring to the symbol I removed, for example:
virtual functions overridden in child class
"typedef" defined other symbol to refer to this missing symbol.
My question is: is there any gcc/g++ option I can turn on to have a output of all references (that gcc/g++ is aware of) direct or indirect to the symbol I removed?
If no such gcc/g++ option, is there any other tool that can produce such output?
Thanks.
Removing the compilation units (c or cpp files) from your project does not completely remove them. Those are typically just the definitions of functions and classes. The declarations of those functions and classes still exist in headers which are likely still being included in other compilation units.
Track down where these things are declared (typically in header files) and either comment them out in the headers or stop including the headers entirely if you don't need anything within them for your project.
For example:
If you are removing foo.c from a project, make sure any instance of #include "foo.h" has been removed from all other c/cpp files
You can instruct LD to emit a linker map containing a cross reference table using the flags -Map=path/to/my_mapfile.map and --cref. More info here:
https://sourceware.org/binutils/docs/ld/Options.html
The map file is very long and terse, but it usually has enough information to help you pinpoint exactly why a given symbol is still being referenced.

Why is it that when you don't define a function you get a linker error and not a compiler?

For Example
#include <iostream>
int add(int x, int y);
int main()
{
cout << add(5, 5) << endl;
}
This would compile but not link. I understand the problem, I just don't understand why it compiles fine but doesn't link.
Because the compiler doesn't know whether that function is provided by a library (or other translation unit). There's nothing in your prototype that tells the compiler the function is defined locally (you could use static for that).
The input to a C or C++ compiler is one translation unit - more or less one source code file. Once the compiler is finished with that one source code, it has done its job.
If you call/use a symbol, such as a function, which is not part of that translation unit, the compiler assumes it's defined somewhere else.
Later on, you link together all the object files and possibly the libraries you want to use, all references are tied together - it's only at this point, when pulling together everything that's supposed to create an executable, one can know that something is missing.
When a compiler compiles, it generates the output (object file) with the table of defined symbols (T) and undefined symbols (U) (see man page of nm). Hence there is no requirement that all the references are defined in every translation unit. When all the object files are linked (with any libraries etc), the final binary should have all the symbols defined (unless the target in itself is a library). This is the job of the linker. Hence based on the requested target type (library or not), the linker might not or might give an error for undefined functions. Even if the target is a non-library, if it is not statically linked, it still might refer to shared libraries (.so or .dll), hence if on the target machine while the binary is run, if the shared libraries are missing or if any symbols missing, you might even get a linker error. Hence between compiler, linker and loader, every one is trying to best provide you with the definition of every symbol needed. Here by giving declaring add, you are pacifying the compiler, which hopes that the linker or loader would do the required job. Since you didnt pacify the linker (by say providing it with a shared library reference), it stops and cribs. If you have even pacified the linker, you would have got the error in the loader.

C++ compilation static variable and shared objects

Description :
a. Class X contains a static private data member ptr and static public function member getptr()/setptr().
In X.cpp, the ptr is set to NULL.
b. libXYZ.so (shared object) contains the object of class X (i.e libXYZ.so contains X.o).
c. libVWX.so (shared object) contains the object of class X (i.e libVWX.so contains X.o).
d. Executable a.exe contains X.cpp as part of translation units and finally is linked to libXYZ.so, libVWX.so
PS:
1. There are no user namespaces involved in any of the classes.
2. The libraries and executable contain many other classes also.
3. no dlopen() has been done. All libraries are linked during compile time using -L and -l flags.
Problem Statement:
When compiling and linking a.exe with other libraries (i.e libXYZ.so and libVWX.so), I expected a linker error (conflict/occurance of same symbol multiple times) but did not get one.
When the program was executed - the behavior was strange in SUSE 10 Linux and HP-UX 11 IA64.
In Linux, when execution flow was pushed across all the objects in different libraries, the effect was registered in only one copy of X.
In HPUX, when execution flow was pushed across all the objects in different libraries, the effect was registered in 3 differnt copies of X (2 belonging to each libraries and 1 for executable)
PS : I mean during running the program, the flow did passed thourgh multiple objects belonging to a.exe, libXYZ.so and libVWX.so) which interacted with static pointer belonging to X.
Question:
Is Expecting linker error not correct? Since two compilers passed through compilation silently, May be there is a standard rule in case of this type of scenario which I am missing. If so, Please let me know the same.
How does the compiler (gcc in Linux and aCC in HPUX) decide how many copies of X to keep in the final executable and refer them in such scenarios.
Is there any flag supported by gcc and aCC which will warn/stop compilation to the users in these kind of scenarios?
Thanks for your help in advance.
I'm not too sure that I've completely understood the scenario. However,
the default behavior on loading dynamic objects under Linux (and other
Unices) is to make all symbols in the library available, and to only use
the first encountered. Thus, if you both libXYZ.so and libVWX.so
contain a symbol X::ourData, it is not an error; if you load them in
that order, libVWX.so will use the X::ourData from libXYZ.so,
instead of its own. Logically, this is a lot like a template definition
in a header: the compiler chooses one, more or less by chance, and if
any of the definitions is not the same as all of the others, it's
undefined behavior. This behavior can be
overridden by passing the flag RTLD_LOCAL to dlopen.
With regards to your questions:
The linker is simply implementing the default behavior of dlopen (that which you get when the system loads the library implicitely). Thus, no error (but the logical equivalent of undefined behavior if any of the definitions isn't the same).
The compiler doesn't decide. The decision is made when the .so is loaded, depending on whether you specify RTLD_GLOBAL or RTLD_LOCAL when calling dlopen. When the runtime calls dlopen implicitly, to resolve a dependency, it will use RTLD_GLOBAL if this occurs when loading the main executable, and what ever was used to load the library when the dependency comes from a library. (This means, of course, that RTLD_GLOBAL will propagate until you invoke dlopen explicitly.)
The function is "public static", so I assume it's OOP-meaning of "static" (does not need instance), not C meaning of static (file-static; local to compilation unit). Therefore the functions are extern.
Now in Linux you have explicit right to override library symbols, both using another library or in the executable. All extern symbols in libraries are resolved using the global offset table, even the one the library actually defines itself. And while functions defined in the executable are normally not resolved like this, but the linker notices the symbols will get to the symbol table from the libraries and puts the reference to the executable-defined one there. So the libraries will see the symbol defined in the executable, if you generated it.
This is explicit feature, designed so you can do things like replace memory allocation functions or wrap filesystem operations. HP-UX probably does not have the feature, so each library ends up calling it's own implementation, while any other object that would have the symbol undefined will see one of them.
There is a difference beetween "extern" symbols (which is the default in c++) and "shared libary extern". By default symbols are only "extern" which means the scope of one "link unit" e.g. an executable or a library.
So the expected behaviour would be: no compiler error and every module works with its own copy.
That leads to problems of course in case of inline compiling etc...etc...
To declare a symbol "shared library extern" you have to use a ".def" file or an compiler declaration.
e.g. in visual c++ that would be "_declspec(dllexport)" and "_declspec(dllimport)".
I do not know the declarations for gcc at the moment but I am sure someone does :-)

Preventing objects from being linked if they are not needed?

I have an ARM project that I'm building with make. I'm creating the list of object files to link based on the names of all of the .c and .cpp files in my source directory. However, I would like to exclude objects from being linked if they are never used. Will the linker exclude these objects from the .elf file automatically even if I include them in the list of objects to link? If not, is there a way to generate a list of only the objects that need to be linked?
You have to compile your code differently to strip out function and data that isn't used. Usually all the objects are compiled into the same symbol, so they can't be individually omitted if they're not used.
Add the two following switches to your compiler line:
-ffunction-sections -fdata-sections
When you compile, the compiler will now put individual functions and data into their own sections instead of lumping them all in one module section.
Then, in your linker, specify the following:
--gc-sections
This instructs the linker to remove unused sections ("gc" is for garbage collection). It will garbage collect parts of files and entire files. For example, if you're compiling an object, but only use 1 function of 100 in the object, it will toss out the other 99 you're not using.
If you run into issues with functions not found (it happens due to various reasons like externs between libraries), you can use .keep directives in your linker file (*.ld) in order to prevent garbage collection on those individual functions.
If you are using RealView, it seems that it is possible. This section discusses it:
3.3.3 Unused section elimination
Unused section elimination removes code that is never executed, or data that is not
referred to by the code, from the final image. This optimization can be controlled by the
--remove, --no_remove, --first, --last, and --keep linker options. Use the --info unused
linker option to instruct the linker to generate a list of the unused sections that have been
eliminated.
Like many people said, the answer is "depends". In my experience, RVCT is very good about dead code stripping. Unused code and data will almost always be removed in the final link stage. GCC, on the other hand (at least without the LLVM back end), is rather poor at whole image static analysis and will not do a very good job at removing unused code (and woe be it to you if your code is in different sections requiring long jumps). You can take some steps to mitigate it, such as using function-sections, which creates a separate section for each function and enables some better dead code stripping.
Have your linker generate a map file of your binary so you can see what made it in there and what got stripped out.
Depending on the sophistication of the compiler/linker and optimization level, the linker will not link in code that isn't being called.
What compiler/linker are you using? Some linkers do this automatically, and some provide the feature as a command-line option.
In my experience, many compilers will not include unused code on an object file basis. Some may not have this resolution and will include entire libraries ("because this makes the build process faster").
For example, given a file junk.c and it has three functions: Func1, Func2 and Func3. The build process creates an object file, junk.o, which has all three functions in it. If function Func2 is not used, it will be included anyway because the linker can't exclude one function out of an object file.
On the other hand, given files: Func1.c, Func2.c, and Func3.c, with the functions above, one per file. If Func2 in Func2.c is not used, the linker will not include it.
Some linkers are intelligent enough to exclude files out of libraries. However, each linker is different on its granularity of file inclusion (and thus file exclusion). Read your linker's manual or contact their customer support for exact information.
I suggest moving the suspect functions into a separate file (one function per file) and rebuild. Measure the code size before and after. Also, there may be a difference between Debug and Release linking. The Debug linking could be lazy and just throw everything in while the Release linking puts more effort into removing unused code.
Just my thoughts and experience, Your Mileage May Vary (YMMV).
Traditionally linkers link in all object files that are explicity specified in the command line, even if they could be left out and the program would not have any unresolved symbols. This means that you can deliberately change the behaviour of a program by including an object file that does something triggered from static initialization but is not called directly or indirectly from main.
Typically if you place most of your object files in a static library and link this library with a single object file containing your entry point the linker will only pick out members of the library (iteratively) that help resolve an unresolved symbol reference in the original object file or one included subsequently because it resolved a previous unresolved symbol.
In short, place most of your object files in a library and just link this with one object containing your entry point.