How are function definitions determined with header files? - c++

When using separate files in C++, I know that functions can be declared using header files like this:
// MyHeader.h
int add(int num, int num2);
// MySource.cpp
int add(int num, int num2) {
return num + num2;
}
// Main.cpp
#include "MyHeader.h"
#include <iostream>
int main() {
std::cout << add(4, 5) << std::endl;
return 0;
}
My question is, in this situation, how does the compiler determine the function definition of add(int,int) when MyHeader.h and Main.cpp have no references at all to MySource.cpp?
As, if there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?

The function declaration gives the compiler enough information to generate a call to that function.
The compiler then generates an object file that specifies the names (which, in the case of C++ are mangled to specify the arguments, namespace, cv-qualifiers, etc.) of external functions to which that object file refers (along with another list of names it defines).
The linker then takes all those object files, and tries to match up every name that something refers to but doesn't define with some other object file that defines the same name. Then it assigns and fills in addresses, so where one object file refers to nameX, it fills in the address it's assigning to nameX from the other file.
At least in a typical case, the object files it looks at will include a number of libraries (standard library + any others you specify). A library is basically just a collection of object files, shoved together into a single file, with enough data to index what data is which object file. In a few cases, it also includes some extra meta-data to (for example) quickly find an object file that defines a specific name (obviously handy for the sake of faster linking, but not really an absolute necessity).
If there are two or more functions with exactly the same mangled name, then your code has undefined behavior (you're violating the one definition rule). The linker will usually give an error message telling you that nameZ was defined in both object file A and object file B (but the C++ standard doesn't really require that).

The compiler does not "determine" (you mean "know") the function definition. The linker does. You have just discovered why the build process consists of compiling and linking.
So, basically, the compiler produces two object files here. One which contains the definition of add and one which just refers to the "unknown" function add. The linker then takes the two object files and puts the reference and definition together. Of course, that's just a very simple explanation, but for a beginner, that's all you need to know.

The compiler doesn't compile header files; it compiles source files. It will include the code in the header when the header is #included in a source file being compiled, but on its own, the header file doesn't "do" anything.
Also, the compiler doesn't worry about whether a function is defined or not. It just compiles against function declarations. It's the linker that resolves the definitions of functions.
You don't need to include a definition of a function at all, unless it's being called by some other code you need to link.
As to your question, "If there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?": It depends on the linker and the settings, but generally, if you have more than one definition of a function with the same signature, the linker will issue an error stating that the function is multiply defined.

Related

Why is header including sufficient for definitions?

as far as i understood; headerfiles declare things. Now including header files like #include iostream includes the header file iostream.h. This is telling the compiler for example „there is something called: cout“.
QUESTION: How does the compiler get to the definition of cout (or all the other functions)? In my understanding the compiler only gets to know names and types but no definitions.
Thanks in advance.
Actually: It doesn't. It needs to know how the objects look like, what interfaces they offer (so for std::cout that's a some std::ostream stream object, apparently a subclass of) and that such objects do exist somewhere. That's it. What the compiler then does is adding placeholders for that object – right as it does for function calls as well.
After compilation there's then a second unit: the linker. As its name tells, it links all those compilation units together. If it now sees such a place holder it will replace it with the address of the object or function – which must exist, of course (for std::cout, there's an extern declaration in the header, but some other source file must have implemented it without extern – and if pre-compiled in some library), otherwise a linker error is thrown.

Clarification about the header-guards and header-file inclusion used in C/C++

I know people recommend including header guards in header files, to prevent header files contents from being inserted by the pre-processor into the source-code files more than once.
But consider the following scenario:
Let's say I have the files main.cpp , stuff.cpp, and commonheader.h, with the .h file having its header guards.
If either .cpp files tries to include commonheader.h more than once, then the preprocessor
will stop that from happening, and after compiling to object code we get,
main.o containing the contents of commonheader.h exactly once.
stuff.o containing the contents of commonheader.h exactly once.
Note that the contents of commonheader, have been repeated across the files, but not within the same .o file.
So what happens during the linking step? Since the .o files are being fused into an exectuable
we will have to ensure for a second time that the contents of commonheader are not being repeated. Does the compiler take care of that? If not, wouldn't that be a problem when we are dealing with huge header files, giving rise to code repetition across files and leading to large executable sizes.
If I am making some conceptual mistake anywhere in the question, please correct me.
Typically your header file should not actually define any symbols, it should just declare them. So commonheader.h would look like this (omitting the include guards):
void commonFunc1(void);
void commonFunc2(void);
In that case, there is no problem. If you call commonFunc1 in main.cpp and stuff.cpp, both main.o and stuff.o will know they want to link against a symbol called commonFunc1 and the linker will try to find that symbol. If the linker doesn't find the symbol, you get an undefined reference error. The actual definition of commonFunc1 needs to be in some cpp file.
If you really want to define functions in your header file, use static so that the linker does not see them. So your commonheader.h could look like:
static void commonFunc1()
{
/* ... do stuff ... */
}
In this case, the linker does not know about commonFunc1 and no errors will occur. This could increase the executable size though; you'll probably end up with two copies of the code for commonFunc1.
To expand Grayson's answer to cover variables. If you want to declare a variable in a header file you should use the extern keyword. This is one way to handle global variables.
In the header file global.h you write this:
extern Globals globals;
then you can use foo in any file including global.h, while in global.cpp you write
#include "globalstype.h"
Globals globals;
Note that global.cpp doesn't need to include global.h, however you will need to make sure global.cpp is compiled into each usage otherwise the linker will complain.
Header files normally contain declarative code not definitive code. That is they declare the existence of something that must exist exactly once. Macros and inline functions are allowed and necessarily duplicate wherever they are used.
The declarations are used by the compiler to insert unresolved links (or references) into the object code. The job of the linker is to resolve these links by matching the reference with the one single definition.
If you omit the include guards, with multiple inclusion in a single translation unit you will get a compiler error for multiple declaration of an existing symbol. If however you have a header erroneously containing a definition, and the header is included in more than one translation unit, there will be more than one object file with a definition - this instead causes a linker error for multiple definition.
So while:
extern int b ; // declaration, may occur in multiple translation units
is fin in a header file,
int b ; // definition, must occur in only object file.
is not.
Not the the declarations are not included in the object code, rather the compiler uses them to create references that the linker will resolve if the compiler has not already uses the definition and resolved it already.
Yes, it can be a problem. You could end up with multiple definitions, or redundant copies.
C is quite simple in this regard. You have static, extern, and inline -- and compilers also define several ways to alter visibility. I think a lot of this has been covered by other answers.
C++ is quite different, however. There is a lot of information and there are also implicit definitions (e.g. the compiler may emit a copy constructor or RTTI).
With C++, the likelihood that a definition appears in a header is much more likely -- consider templates, methods defined in a class declaration, and so on. C++ defaults to using the One Definition Rule. You will want to read about it in more detail, but it basically states that some categories of symbols may be multiply-defined; depending on the decoration and the location/scope of declaration, that in many cases, the linker is allowed to assume that each body (definition) is identical and it is free to discard any copies it encounters (leaving one definition in your binary). So this really cuts down on the size of the resulting binary, unless you specify a copy shall be produced.
However, having those definitions in your headers can surely increase compilation times, memory and files required to compile each file, visible dependencies, and will increase the number of files which must be recompiled when a definition is edited.
Of course, the language still allows bad forms, and will not complain if you repeatedly state over and over again and include in multiple translations definitions which must be copied for each translation. Then you can certainly end up with a lot of bloat.
This may be a good intro:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=386

C/C++ header and implementation files: How do they work?

This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).
So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?
ex. Say I have 3 files
main.cpp
myfunction.cpp
myfunction.hpp
//main.cpp
#include "myfunction.hpp"
int main() {
int A = myfunction( 12 );
...
}
-
//myfunction.cpp
#include "myfunction.hpp"
int myfunction( int x ) {
return x * x;
}
-
//myfunction.hpp
int myfunction( int x );
-
I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?
I apologize if this isn't clear or I'm vastly mistaken about something, new here
The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.
The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.
In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.
So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.
Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.
You have to understand that compilation is a 2-steps operations, from a user point of view.
1st Step : Object compilation
During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.
Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.
2nd Step : Linking
During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.
And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.
It even allows to solve undefined symbol on demand, with dl* functions.
1. The principle
When you write:
int A = myfunction(12);
This is translated to:
int A = #call(myfunction, 12);
where #call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.
2. A point on ABI
How does this #call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).
Therefore, #call actually does:
#push 12, reg0
#invoke myfunction
And the function definition knows that its first argument (x) is located in reg0.
3. But I though dictionaries were for dynamic languages ?
And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.
For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:
.undefined
[0]: myfunction
Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:
those which are within the library, and can be referenced through an offset (the final address is still unknown)
those which are outside the library, and whose address is completely unknown until runtime.
Both can be treated in the same fashion.
.dynamic
[0]: myfunction at <undefined-address>
And then the code will reference the look-up entry:
#invoke .dynamic[0]
When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).
As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:
The compiler is invoked for each cpp file and translate it to an
object file (binary code) with a symbol table which associates
function name (names are mangled in c++) to their location in the
object file.
The linker is invoked only one time: whith every object file in
parameter. It will resolve function call location from one object
file to another thanks to symbol tables. One main() function MUST
exist somewhere. Eventually a binary executable file is produced
when the linker found everything it needs.
The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit).
When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.
In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.
In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.
int myfunction(int); is the function prototype. You declare function with it so that compiler knows that you are calling this function when you write myfunction(0);.
And how do the header and main function even know the function definition exists?
Well, this is the job of Linker.
When you compile a program, the preprocessor adds source code of each header file to the file that included it. The compiler compiles EVERY .cpp file. The result is a number of .obj files.
After that comes the linker. Linker takes all .obj files, starting from you main file, Whenever it finds a reference that has no definition (e.g. a variable, function or class) it tries to locate the respective definition in other .obj files created at compile stage or supplied to linker at the beginning of linking stage.
Now to answer your question: each .cpp file is compile into a .obj file containing instructions in machine code. When you include a .hpp file and use some function that's defined in another .cpp file, at linking stage the linker looks for that function definition in the respective .obj file. That's how it finds it.

C++ including a ".h" file, function duplication confusion

I'm currently writing a program, and couldn't figure out why I got an error (note: I already fixed it, I'm curious about WHY the error was there and what this implies about including .h files).
Basically, my program was structured as follows:
The current file I'm working with, I'll call Current.cc (which is an implementation of Current.h).
Current.cc included a header file, named CalledByCurrent.h (which has an associated implementation called CalledByCurrent.cc). CalledByCurrent.h contains a class definition.
There was a non-class function defined in CalledByCurrent.cc called thisFunction(). thisFunction() was not declared in CalledByCurrent.h since it was not actually a member function of the class (just a little helper function). In Current.cc, I needed to use this function, so I just redefined thisFunction() at the top of Current.cc. However, when I did this, I got an error saying that the function was duplicated. Why is this, when myFunction() wasn't even declared in CalledByCurrent.h?
Thus, I just removed the function from Current.cc, now assuming that Current.cc had access to thisFunction() from CalledByCurrent.cc. However, when I did this, I found that Current.cc did not know what function I was talking about. What the heck? I then copied the function definition for thisFunction() to the top of my CalledByCurrent.h file and this resolved the problem. Could you help me understand this behavior? Particularly, why would it think there was a duplicate, yet it didn't know how to use the original?
p.s - I apologize for how confusing this post is. Please let me know if there's anything I can clear up.
You are getting multiple definitions from the linker - it sees two functions with the same name and complains. For example:
// a.cpp
void f() {}
// b.cpp
void f() {}
then
g++ a.cpp b.cpp
gives:
C:\Users\neilb\Temp\ccZU9pkv.o:b.cpp:(.text+0x0): multiple definition of `f()'
The way round this is to either put the definition in only one .cpp file, or to declare one or both of the functions as static:
// b.cpp
static void f() {}
You can't have two global functions with the same name (even in 2 different translation units). To avoid getting the linker error define the function as static so that it is not visible outside the translation unit.
EDIT
You can use the function in the other .cpp file by using extern keyword. See this example:
//Test.cpp
void myfunc()
{
}
//Main.cpp
extern void myfunc();
int main()
{
myfunc();
}
It will call myfunc() defined in test.cpp.
The header file inclusion mechanism should be tolerant to duplicate header file inclusions.
That's because whenever you simply declare a function it's considered in extern (global) scope (whether you declare it in a header file or not). Linker will have multiple implementation for the same function signature.
If those functions are truely helper functions then, declare them as;
static void thisFunction();
Other way, if you are using the same function as helper then, simply declare it in a common header file, say:
//CalledByCurrent.h (is included in both .cc files)
void thisFunction();
And implement thisFunction() in either of the .cc files. This should solve the problem properly.
Here are some ideas:
You didn't put a header include guard in your header file. If it's being included twice, you might get this sort of error.
The function's prototype (at the top) doesn't match its signature 100%.
You put the body of the function in the header file.
You have two functions of the same signature in two different source files, but they aren't marked static.
If you are using gcc (you didn't say what compiler you're using), you can use the -E switch to view the preprocessor output. This includes expanding all #defines and including all #includes.
Each time something is expanded, it tells you what file and line it was in. Using this you can see where thisFunction() is defined.
There are 2 distinct errors coming from 2 different phases of the build.
In the first case where you have a duplicate, the COMPILER is happy, but the LINKER is complaining because when it picks up all the function definitions across the different source files it notices 2 are named the same. As the other answers state, you can use the static keyword or use a common definition.
In the second case where you see your function not declared in this scope, its because the COMPILER is complaining because each file needs to know about what functions it can use.
Compiling happens before Linking, so the COMPILER cannot know ahead of time whether or not the LINKER will find a matching function, thats why you use declarations to notify the COMPILER that a definition will be found by the LINKER later on.
As you can see, your 2 errors are not contradictory, they are the result of 2 separate processes in the build that have a particular order.

C++ header file question

I was trying out some c++ code while working with classes and this question occurred to me and it's bugging me a little.
I have created a header file that contains my class definition and a cpp file that contains the implementation.
If I use this class in a different cpp file, why am I including the header file instead of the cpp file that contains the class implementations?
If I include the class implementation file, then the class header file should be imported automatically right (since i've already included the header file in the implementation file)? Isn't this more natural?
Sorry if this is a dumb question, i'm genuinely interested in knowing why most people include .h instead of .cpp files when the latter seems more natural (I know python somewhat, maybe that's why it seems natural to me atleast). Is it just historical or is there a technical reason concerning program organisation or maybe something else?
Because when you're compiling another file, C++ doesn't actually need to know about the implementation. It only needs to know the signature of each function (which paramters it takes and what it returns), the name of each class, what macros are #defined, and other "summary" information like that, so that it can check that you're using functions and classes correctly. The contents of different .cpp files don't get put together until the linker runs.
For example, say you have foo.h
int foo(int a, float b);
and foo.cpp
#include "foo.h"
int foo(int a, float b) { /* implementation */ }
and bar.cpp
#include "foo.h"
int bar(void) {
int c = foo(1, 2.1);
}
When you compile foo.cpp, it becomes foo.o, and when you compile bar.cpp, it becomes bar.o. Now, in the process of compiling, the compiler needs to check that the definition of function foo() in foo.cpp agrees with the usage of function foo() in bar.cpp (i.e. takes an int and a float and returns an int). The way it does that is by making you include the same header file in both .cpp files, and if both the definition and the usage agree with the declaration in the header, then they must agree with each other.
But the compiler doesn't actually include the implementation of foo() in bar.o. It just includes an assembly language instruction to call foo. So when it creates bar.o, it doesn't need to know anything about the contents of foo.cpp. However, when you get to the linking stage (which happens after compilation), the linker actually does need to know about the implementation of foo(), because it's going to include that implementation in the final program and replace the call foo instruction with a call 0x109d9829 (or whatever it decides the memory address of function foo() should be).
Note that the linker does not check that the implementation of foo() (in foo.o) agrees with the use of foo() (in bar.o) - for example, it doesn't check that foo() is getting called with an int and a float parameter! It's kind of hard to do that sort of check in assembly language (at least, harder than it is to check the C++ source code), so the linker relies on knowing that the compiler has already checked that. And that's why you need the header file, to provide that information to the compiler.
The magic is done by the linker. Every .cpp when compiled will generate an intermediate object file with all the exported and imported symbols in a table. The linker will reconcile them. In other words, you just have to include the header, and every time you will reference the included class, the compiler will put the signature of the referenced class in the symbol table.
If you include the .cpp file, you will have the same code compiled twice and you will get linking errors, as the same symbol will be found twice by the linker and hence it will be ambiguous.
One technical reason is compilation speed. Let's suppose your class uses 10 other classes (e.g. as types for member variables). Including the long .cpp files for all 10 classes would make your class compile much slower (i.e. maybe 2 seconds instead of 1 second).
Another reason is hiding the implementation. Let's suppose you are writing a class to be used by 10 other teams in your company. All they have to know and learn about your class is in the .h file (public interface). You can freely do whatever you want in the .cpp file (implementation), you may change it as often you want, they won't care. But if you change the .h file, they may have to adjust their code using your class.
For each method body, it's your choice whether to put it to the .h file or to the .cpp file. If it's in the .h file, the compiler can inline it when called, which may make the code a bit faster. But compilation will be slower, and the temporary .o (.obj) files may become larger (because each of them will contain the compiled method body), and the program binary (.exe) may become larger, because the function body takes space as many times it is inlined.