I'm a bit confused, I haven't been doing C since years and I'm starting with it right again. One thing I'm not clearly sure is the relation of two files that call each others function, example:
testa.c:
int main (void)
{
callTheOtherFunction();
return 0;
}
and the other file
testb.c
callTheOtherFunction(){
//do some stuff
}
now my "makefile" looks like
gcc -o test ./testa.c ./testb.c
What does it mean? Is callTheOtherFunction now part of testa.c, like both files have been merged? Or has it something to do with inheritance ? Is callTheOtherFunction now a global function, or how would you call it?
I need to draw an UML diagram out of it, that's why I need the expression for that case.
The source files are never "merged". What happens is that during compilation phase two object files will be produced - one for each source file and later during linking phase the two object files will be linked(and also linked with some implicit system libraries) producing an executable.
The two files will be compiled to object code by your compiler, then the linker will generate a single executable from the object files. It is, as you say, like the two files have been merged.
callTheOtherFunction will be accessible from anywhere (I suppose you would call that a global function) as you did not mark its definition static.
As a side note, you should probably get a compiler warning from that compilation as you do not have a declaration of callTheOtherFunction in testa.c.
Related
Let's say I have foo.cpp with following content
int foo() {
return 123;
}
and main.cpp in which I use foo:
int main() {
int r = foo();
return r;
}
I can compile both source files into object code and then link them using link-time optimizations to make foo() call inlined into main().
But can I achieve the same effect by listing both files in compiler command line, like c++ foo.cpp main.cpp? Or is it just boils down to
foreach(file in files)
UsualCompilingRoutinesForSingleFile(file)
?
If yes, why compiler isn't allowed to concatenate all files passed in into a single one to achieve sort of LTO?
If yes, why compiler isn't allowed to concatenate all files passed in into a single one to achieve sort of LTO?
The compiler is allowed and able to "see" all files at the "same time" to perform LTO.
But that is indeed not the same as having a single source file.
From the gcc docs ( only as an example, other compilers support similar technology ):
LTO mode, in which the whole program is read into the compiler at link-time and optimized in a similar way as if it were a single source-level compilation unit.
As you can see, the result will be the same as it would be if you present all files at once to the compiler, but without having trouble from ordering all the included headers / sources in the "correct" order.
For more information from gcc for link time related optimization levels & methods:
https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
Concatenating the files is not the same, due to local/static objects. You can have conflicts (think unnamed namespaces). For instance the first file uses a local function foo that looks up in a static map something, and then the second file has another local function foo that looks up in a hash map (for whatever reason, and yes I agree that it's also a bad design).
If you compile both files together, concatenating them, then you break encapsulation from translation units and you get multiple definitions of the same files.
In your example, the compiler compiles both files separated and the links the together, it's not LTO, which is something else (not generating just object files, but also a kind of AST that can be merged with others, and then optimized).
This question already has answers here:
What is an undefined reference/unresolved external symbol error and how do I fix it?
(39 answers)
Closed 9 years ago.
I'm trying to wrap my head around C++ developement using the SFML library. I'm following a tutorial (http://www.gamefromscratch.com/page/Game-From-Scratch-CPP-Edition-Part-7.aspx), and using visual studio 2010.
A problem I keep running into regards unresolved externals. I'm really struggling with this, because unlike most errors I run into, it doesn't seem to a) have anything to do with the code, and b) doesn't behave consistently. Rather than give y'all a specific example and ask for help solving that one example, I'm hoping to develop a more reliable way of attacking these problems. I'll give you an outline of a common occurance though.
I have a solution with 8 header files and 8 cpp files that correspond to them. The solution is stable: It compiles and runs with no errors or warnings.
I'll go into a header file and add this line:
virtual void DoNothing();
I'll then go into the matching cpp file and write the method:
void DoNothing(){};
I compile and run, and get 5 unresolved external errors. They don't point to any line of code, so I don't really know how to fix them, but I obviously did something wrong. Fair enough. Trying to get back to a stable state, I delete the two lines of code I had inserted, and compile. Even though the code is identical to the last stable state, I get the same unresolved external errors.
Trying random things, I go into another cpp file and reverse the order of two included header files. The game compiles now. If I switch the order of the included header files back, it compiles.
What the hell are unresolved external errors? Why don't they seem to behave consistently with the code I've entered? How do I read them to find out what the problem is, and how do I avoid them in the first place?
Thank you.
ps: If there are more specific details I should provide, please just let me know.
"Unresolved External" errors mean that your code is referring to something (usually a function or method, but can be a variable too) that does not exist. These are link errors, and not compile errors; that's why you don't get a line number and more helpful error messages.
Let me give you a little background on how C++ code is turned into an executable (and keep in mind that I'm simplifying things a bit.)
Each C++ source file (and not header file) in your project is compiled separately. A ".cpp" file and all the headers it includes are compiled into what is called an object file or object code. (These files have a ".obj" or ".o" extension.) You can also think of library files (that is ".lib" files on Windows and ".a" files on Linux) as a collection of these object files, stored for later use.
To produce the executable programs (e.g. the EXE or DLL file on Windows) all these object files are linked together are voila!
Now, the important thing here is that each source file is compiled in isolation and independent from other source files. So, if the code in one file calls a function that is implemented in another file, the compiler won't see the actual body of that function and can only assume that as long as the declaration of the called function is visible (i.e. the prototype, i.e. the line you write in headers,) then these files are going to be linked together eventually and will leave the task of actually making the call to the linker. This usually means that as long as you include the right headers, your compiler is going to be happy.
But the linker is going to be more tenacious and pedantic. At link time, you really really need to provide the body (i.e. the implementation) of all the functions that you use all over the project. It is your task to make sure that all the right object files and libraries are linked together and the implementation of each used function exists somewhere among them exactly once (no more, no less.)
This brings us to your problem. When you get an "unresolved external" linker error, this means that the body of a function you've called does not exist anywhere in object files and libraries that you are linking together.
Obviously, one of two things is happening. Either you have included the header for an external library, but have forgotten to link in the library file itself (which is not your problem here) or you've declared (i.e. written the prototype for) a function but have forgotten to implement its body.
Keep in mind that the linker is really strict here. If you declare something like this in your class:
class Foo {
void bar (int x);
};
and then in your ".cpp" file, implement this function:
void bar (int x)
{
// Do nothing
}
then you'll get an unresolved external error if you actually call Foo::bar() anywhere in your program, because the implemented bar() is not a method of Foo (you should have implemented void Foo::bar (int x) {}.) Similar things happen if you slightly misspell or get the type of the arguments wrong or whatnot.
Reading linker errors and making sense from them can be hard. Sometimes, the name that the linker is complaining about (the "symbol" it says it can't find) is all mangled beyond recognition. This has to do with *Application Binary Interface*s (ABI) and several decades of history and precedence. Anyways, most of the time, if you look closely and the link error message, you can see what the function name was and check your code (or libraries) and try again.
Also, though it's rare, it sometimes happens that in order to solve some link issues, you have to resort to completely rebuilding your project.
Every time I've seen behavior like this it has been because of a circular reference between projects. For example, project A has a reference to an object/symbol implemented in project B while at the same time project B has a reference to an object/symbol from project A. Every time you build your solution, the tools have to compile one project first, then the other. If you make a change to the second project to be compiled, the first one cannot see the change on the first round of compilations and the build fails. If you manage to manually build project B (against a now obsolete copy of library B), then the solution starts to build correctly. More complex cycles are possible (e.g. A depends on B, which depends on C, which depends on A). You don't mention multiple projects explicitly, but I bet you have them.
These circular references are common on large solutions that have been around for a long time and have grown slowly over time. People get in habit of adding links from everything to everything because they need one function from here, a struct from there...
Hunt down these dependencies. You should be able to do a full clean rebuild from nothing but the source code. Your dependency tree should look like... Well, a tree; not a graph.
This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).
So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?
ex. Say I have 3 files
main.cpp
myfunction.cpp
myfunction.hpp
//main.cpp
#include "myfunction.hpp"
int main() {
int A = myfunction( 12 );
...
}
-
//myfunction.cpp
#include "myfunction.hpp"
int myfunction( int x ) {
return x * x;
}
-
//myfunction.hpp
int myfunction( int x );
-
I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?
I apologize if this isn't clear or I'm vastly mistaken about something, new here
The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.
The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.
In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.
So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.
Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.
You have to understand that compilation is a 2-steps operations, from a user point of view.
1st Step : Object compilation
During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.
Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.
2nd Step : Linking
During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.
And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.
It even allows to solve undefined symbol on demand, with dl* functions.
1. The principle
When you write:
int A = myfunction(12);
This is translated to:
int A = #call(myfunction, 12);
where #call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.
2. A point on ABI
How does this #call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).
Therefore, #call actually does:
#push 12, reg0
#invoke myfunction
And the function definition knows that its first argument (x) is located in reg0.
3. But I though dictionaries were for dynamic languages ?
And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.
For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:
.undefined
[0]: myfunction
Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:
those which are within the library, and can be referenced through an offset (the final address is still unknown)
those which are outside the library, and whose address is completely unknown until runtime.
Both can be treated in the same fashion.
.dynamic
[0]: myfunction at <undefined-address>
And then the code will reference the look-up entry:
#invoke .dynamic[0]
When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).
As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:
The compiler is invoked for each cpp file and translate it to an
object file (binary code) with a symbol table which associates
function name (names are mangled in c++) to their location in the
object file.
The linker is invoked only one time: whith every object file in
parameter. It will resolve function call location from one object
file to another thanks to symbol tables. One main() function MUST
exist somewhere. Eventually a binary executable file is produced
when the linker found everything it needs.
The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit).
When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.
In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.
In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.
int myfunction(int); is the function prototype. You declare function with it so that compiler knows that you are calling this function when you write myfunction(0);.
And how do the header and main function even know the function definition exists?
Well, this is the job of Linker.
When you compile a program, the preprocessor adds source code of each header file to the file that included it. The compiler compiles EVERY .cpp file. The result is a number of .obj files.
After that comes the linker. Linker takes all .obj files, starting from you main file, Whenever it finds a reference that has no definition (e.g. a variable, function or class) it tries to locate the respective definition in other .obj files created at compile stage or supplied to linker at the beginning of linking stage.
Now to answer your question: each .cpp file is compile into a .obj file containing instructions in machine code. When you include a .hpp file and use some function that's defined in another .cpp file, at linking stage the linker looks for that function definition in the respective .obj file. That's how it finds it.
Lets say I have two .cpp files, file1.cpp and file2.cpp, which use std::vector<int>. Suppose that file1.cpp has a int main(void). If I compiled both into file1.o and file2.o, and linked the two object files into an elf binary which I can execute. I am compiling on a 32-bit Ubuntu Linux machine.
My question regards how the compiler and linker put together the symbols for the std::vector:
When the linker makes my final binary, is there code duplication? Does the linker have one set of "templated" code for the code in f1.o that uses std::vector and another set of std::vector code for the code that comprises f2.o?
I tried this for myself (I used g++ -g) and I looked at my final executable disassembly, and I found the labels generated for the vector constructor and other methods were apparently random, although the code from f1.o appeared to have called the same constructor as the code from f2.o. I could not be sure, however.
If the linker does prevent the code duplication, how does it do it? Must it "know" what templates are? Does it always prevent code duplication regarding multiple uses of the same templated code across multiple object files?
It knows what the templates are through name mangling. The type of the object is encoded by the compiler in its name, and that allows the linker to filter out the duplicate implementations of the same template.
This is done during linking, and not compilation, because each .o file can be linked with anything thus cannot be stripped of something that may later be needed. Only the linker can decide which code is unused, which template is duplicate, etc. This is done by using "Weak Symbols" in the object's symbol list: Symbols that the linker can remove if they appear multiple times (as opposed to other symbols, like user-defined functions, that cannot be removed if duplicate and cause a linking error).
Your question is stated verbatim in the opening section of this documentation:
http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html
Technically due to the "one definition rule" there is only one std::vector<int> and therefore the code should be linked together. What may happen is that some code is inlined which would speed up execution time but could produce more code.
If you had one file using std::vector<int> and another using std::vector<unsigned int> then you would have 2 classes and potentially lots of duplicate code.
Of course the writers of vector might use some common code for certain situations eg POD types that removes the duplication.
As I understand function-level linking builds (explicitly or not) a graph of all possible calls and only includes the reachable functions' code into the produced binary. But how does it deal with variables declared at file level?
Say I have
MyClass GlobalVariable;
static MyClass StaticGlobalVariable;
in some file that contains only these two variables and a set of functions not actually called from any of the remaining code.
Will the code for these variables allocation/initialization be included into the output?
From experience (rather than quoting the standard):
If the initilaization has visible side effects like calls into external libraries or file I/O, the initialization will always happen.
boost::singleton_default provides an interesting solution that enforces the initialization to be done only when the object is referenced elsewhere, i.e. when all other references to the object are removed by the linker, the initialization is removed, too.
Edit: Yes. g++ optimize flags try to figure out function calls and prune away .o files resulting in linker errors. I'm not sure if this happens only with certain optimize flags, but it does happen.
A bad habit in our company is the presence of a lot of 'extern g_GlobalFunction()' definitions in different files. As their calls depended on conditional code, the .o files were often thrown away, resulting in link errors.
We fixed that with g_InitModule() and g_InitFileName() calls that are called hierarchically starting from main(). Mostly, these are empty functions just meant to dissuade g++ from discarding the .o file.