C/C++ header and implementation files: How do they work? - c++

This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).
So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?
ex. Say I have 3 files
main.cpp
myfunction.cpp
myfunction.hpp
//main.cpp
#include "myfunction.hpp"
int main() {
int A = myfunction( 12 );
...
}
-
//myfunction.cpp
#include "myfunction.hpp"
int myfunction( int x ) {
return x * x;
}
-
//myfunction.hpp
int myfunction( int x );
-
I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?
I apologize if this isn't clear or I'm vastly mistaken about something, new here

The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.
The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.
In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.
So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.
Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.

You have to understand that compilation is a 2-steps operations, from a user point of view.
1st Step : Object compilation
During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.
Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.
2nd Step : Linking
During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.
And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.
It even allows to solve undefined symbol on demand, with dl* functions.

1. The principle
When you write:
int A = myfunction(12);
This is translated to:
int A = #call(myfunction, 12);
where #call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.
2. A point on ABI
How does this #call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).
Therefore, #call actually does:
#push 12, reg0
#invoke myfunction
And the function definition knows that its first argument (x) is located in reg0.
3. But I though dictionaries were for dynamic languages ?
And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.
For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:
.undefined
[0]: myfunction
Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:
those which are within the library, and can be referenced through an offset (the final address is still unknown)
those which are outside the library, and whose address is completely unknown until runtime.
Both can be treated in the same fashion.
.dynamic
[0]: myfunction at <undefined-address>
And then the code will reference the look-up entry:
#invoke .dynamic[0]
When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).

As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:
The compiler is invoked for each cpp file and translate it to an
object file (binary code) with a symbol table which associates
function name (names are mangled in c++) to their location in the
object file.
The linker is invoked only one time: whith every object file in
parameter. It will resolve function call location from one object
file to another thanks to symbol tables. One main() function MUST
exist somewhere. Eventually a binary executable file is produced
when the linker found everything it needs.

The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit).
When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.
In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.
In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.

int myfunction(int); is the function prototype. You declare function with it so that compiler knows that you are calling this function when you write myfunction(0);.
And how do the header and main function even know the function definition exists?
Well, this is the job of Linker.

When you compile a program, the preprocessor adds source code of each header file to the file that included it. The compiler compiles EVERY .cpp file. The result is a number of .obj files.
After that comes the linker. Linker takes all .obj files, starting from you main file, Whenever it finds a reference that has no definition (e.g. a variable, function or class) it tries to locate the respective definition in other .obj files created at compile stage or supplied to linker at the beginning of linking stage.
Now to answer your question: each .cpp file is compile into a .obj file containing instructions in machine code. When you include a .hpp file and use some function that's defined in another .cpp file, at linking stage the linker looks for that function definition in the respective .obj file. That's how it finds it.

Related

How do you call this relation between 2 .c files?

I'm a bit confused, I haven't been doing C since years and I'm starting with it right again. One thing I'm not clearly sure is the relation of two files that call each others function, example:
testa.c:
int main (void)
{
callTheOtherFunction();
return 0;
}
and the other file
testb.c
callTheOtherFunction(){
//do some stuff
}
now my "makefile" looks like
gcc -o test ./testa.c ./testb.c
What does it mean? Is callTheOtherFunction now part of testa.c, like both files have been merged? Or has it something to do with inheritance ? Is callTheOtherFunction now a global function, or how would you call it?
I need to draw an UML diagram out of it, that's why I need the expression for that case.
The source files are never "merged". What happens is that during compilation phase two object files will be produced - one for each source file and later during linking phase the two object files will be linked(and also linked with some implicit system libraries) producing an executable.
The two files will be compiled to object code by your compiler, then the linker will generate a single executable from the object files. It is, as you say, like the two files have been merged.
callTheOtherFunction will be accessible from anywhere (I suppose you would call that a global function) as you did not mark its definition static.
As a side note, you should probably get a compiler warning from that compilation as you do not have a declaration of callTheOtherFunction in testa.c.

How are function definitions determined with header files?

When using separate files in C++, I know that functions can be declared using header files like this:
// MyHeader.h
int add(int num, int num2);
// MySource.cpp
int add(int num, int num2) {
return num + num2;
}
// Main.cpp
#include "MyHeader.h"
#include <iostream>
int main() {
std::cout << add(4, 5) << std::endl;
return 0;
}
My question is, in this situation, how does the compiler determine the function definition of add(int,int) when MyHeader.h and Main.cpp have no references at all to MySource.cpp?
As, if there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?
The function declaration gives the compiler enough information to generate a call to that function.
The compiler then generates an object file that specifies the names (which, in the case of C++ are mangled to specify the arguments, namespace, cv-qualifiers, etc.) of external functions to which that object file refers (along with another list of names it defines).
The linker then takes all those object files, and tries to match up every name that something refers to but doesn't define with some other object file that defines the same name. Then it assigns and fills in addresses, so where one object file refers to nameX, it fills in the address it's assigning to nameX from the other file.
At least in a typical case, the object files it looks at will include a number of libraries (standard library + any others you specify). A library is basically just a collection of object files, shoved together into a single file, with enough data to index what data is which object file. In a few cases, it also includes some extra meta-data to (for example) quickly find an object file that defines a specific name (obviously handy for the sake of faster linking, but not really an absolute necessity).
If there are two or more functions with exactly the same mangled name, then your code has undefined behavior (you're violating the one definition rule). The linker will usually give an error message telling you that nameZ was defined in both object file A and object file B (but the C++ standard doesn't really require that).
The compiler does not "determine" (you mean "know") the function definition. The linker does. You have just discovered why the build process consists of compiling and linking.
So, basically, the compiler produces two object files here. One which contains the definition of add and one which just refers to the "unknown" function add. The linker then takes the two object files and puts the reference and definition together. Of course, that's just a very simple explanation, but for a beginner, that's all you need to know.
The compiler doesn't compile header files; it compiles source files. It will include the code in the header when the header is #included in a source file being compiled, but on its own, the header file doesn't "do" anything.
Also, the compiler doesn't worry about whether a function is defined or not. It just compiles against function declarations. It's the linker that resolves the definitions of functions.
You don't need to include a definition of a function at all, unless it's being called by some other code you need to link.
As to your question, "If there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?": It depends on the linker and the settings, but generally, if you have more than one definition of a function with the same signature, the linker will issue an error stating that the function is multiply defined.

Clarification about the header-guards and header-file inclusion used in C/C++

I know people recommend including header guards in header files, to prevent header files contents from being inserted by the pre-processor into the source-code files more than once.
But consider the following scenario:
Let's say I have the files main.cpp , stuff.cpp, and commonheader.h, with the .h file having its header guards.
If either .cpp files tries to include commonheader.h more than once, then the preprocessor
will stop that from happening, and after compiling to object code we get,
main.o containing the contents of commonheader.h exactly once.
stuff.o containing the contents of commonheader.h exactly once.
Note that the contents of commonheader, have been repeated across the files, but not within the same .o file.
So what happens during the linking step? Since the .o files are being fused into an exectuable
we will have to ensure for a second time that the contents of commonheader are not being repeated. Does the compiler take care of that? If not, wouldn't that be a problem when we are dealing with huge header files, giving rise to code repetition across files and leading to large executable sizes.
If I am making some conceptual mistake anywhere in the question, please correct me.
Typically your header file should not actually define any symbols, it should just declare them. So commonheader.h would look like this (omitting the include guards):
void commonFunc1(void);
void commonFunc2(void);
In that case, there is no problem. If you call commonFunc1 in main.cpp and stuff.cpp, both main.o and stuff.o will know they want to link against a symbol called commonFunc1 and the linker will try to find that symbol. If the linker doesn't find the symbol, you get an undefined reference error. The actual definition of commonFunc1 needs to be in some cpp file.
If you really want to define functions in your header file, use static so that the linker does not see them. So your commonheader.h could look like:
static void commonFunc1()
{
/* ... do stuff ... */
}
In this case, the linker does not know about commonFunc1 and no errors will occur. This could increase the executable size though; you'll probably end up with two copies of the code for commonFunc1.
To expand Grayson's answer to cover variables. If you want to declare a variable in a header file you should use the extern keyword. This is one way to handle global variables.
In the header file global.h you write this:
extern Globals globals;
then you can use foo in any file including global.h, while in global.cpp you write
#include "globalstype.h"
Globals globals;
Note that global.cpp doesn't need to include global.h, however you will need to make sure global.cpp is compiled into each usage otherwise the linker will complain.
Header files normally contain declarative code not definitive code. That is they declare the existence of something that must exist exactly once. Macros and inline functions are allowed and necessarily duplicate wherever they are used.
The declarations are used by the compiler to insert unresolved links (or references) into the object code. The job of the linker is to resolve these links by matching the reference with the one single definition.
If you omit the include guards, with multiple inclusion in a single translation unit you will get a compiler error for multiple declaration of an existing symbol. If however you have a header erroneously containing a definition, and the header is included in more than one translation unit, there will be more than one object file with a definition - this instead causes a linker error for multiple definition.
So while:
extern int b ; // declaration, may occur in multiple translation units
is fin in a header file,
int b ; // definition, must occur in only object file.
is not.
Not the the declarations are not included in the object code, rather the compiler uses them to create references that the linker will resolve if the compiler has not already uses the definition and resolved it already.
Yes, it can be a problem. You could end up with multiple definitions, or redundant copies.
C is quite simple in this regard. You have static, extern, and inline -- and compilers also define several ways to alter visibility. I think a lot of this has been covered by other answers.
C++ is quite different, however. There is a lot of information and there are also implicit definitions (e.g. the compiler may emit a copy constructor or RTTI).
With C++, the likelihood that a definition appears in a header is much more likely -- consider templates, methods defined in a class declaration, and so on. C++ defaults to using the One Definition Rule. You will want to read about it in more detail, but it basically states that some categories of symbols may be multiply-defined; depending on the decoration and the location/scope of declaration, that in many cases, the linker is allowed to assume that each body (definition) is identical and it is free to discard any copies it encounters (leaving one definition in your binary). So this really cuts down on the size of the resulting binary, unless you specify a copy shall be produced.
However, having those definitions in your headers can surely increase compilation times, memory and files required to compile each file, visible dependencies, and will increase the number of files which must be recompiled when a definition is edited.
Of course, the language still allows bad forms, and will not complain if you repeatedly state over and over again and include in multiple translations definitions which must be copied for each translation. Then you can certainly end up with a lot of bloat.
This may be a good intro:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=386

C++: Compiler and Linker functionality

I want to understand exactly which part of a program compiler looks at and which the linker looks at. So I wrote the following code:
#include <iostream>
using namespace std;
#include <string>
class Test {
private:
int i;
public:
Test(int val) {i=val ;}
void DefinedCorrectFunction(int val);
void DefinedIncorrectFunction(int val);
void NonDefinedFunction(int val);
template <class paramType>
void FunctionTemplate (paramType val) { i = val }
};
void Test::DefinedCorrectFunction(int val)
{
i = val;
}
void Test::DefinedIncorrectFunction(int val)
{
i = val
}
void main()
{
Test testObject(1);
//testObject.NonDefinedFunction(2);
//testObject.FunctionTemplate<int>(2);
}
I have three functions:
DefinedCorrectFunction - This is a normal function declared and defined correctly.
DefinedIncorrectFunction - This function is declared correctly but the implementation is wrong (missing ;)
NonDefinedFunction - Only declaration. No definition.
FunctionTemplate - A function template.
Now if I compile this code I get a compiler error for the missing ';'in DefinedIncorrectFunction.
Suppose I fix this and then comment out testObject.NonDefinedFunction(2). Now I get a linker error.
Now comment out testObject.FunctionTemplate(2). Now I get a compiler error for the missing ';'.
For function templates I understand that they are not touched by the compiler unless they are invoked in the code. So the missing ';' is not complained by the compiler until I called testObject.FunctionTemplate(2).
For the testObject.NonDefinedFunction(2), the compiler did not complain but the linker did. For my understanding, all compiler cared was to know that is a NonDefinedFunction function declared. It didn't care for the implementation. Then linker complained because it could not find the implementation. So far so good.
Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.
So I'm little unclear as to what the compiler does exactly and what the linker does. My understanding is linker links components with their calls. So for when NonDefinedFunction is called it looked for the compiled implementation of NonDefinedFunction and complained. But compiler didn't care about the implementation of NonDefinedFunction but it did for DefinedIncorrectFunction.
I'd really appreciate if someone can explain this or provide some reference.
Thank you.
The function of the compiler is to compile the code that you have written and convert it into object files. So if you have missed a ; or used an undefined variable, the compiler will complain because these are syntax errors.
If the compilation proceeds without any hitch, the object files are produced. The object files have a complex structure but basically contain five things
Headers - The information about the file
Object Code - Code in machine language (This code cannot run by itself in most cases)
Relocation Information - What portions of code will need to have addresses changed when the actual execution occurs
Symbol Table - Symbols referenced by the code. They may be defined in this code, imported from other modules or defined by linker
Debugging Info - Used by debuggers
The compiler compiles the code and fills the symbol table with every symbol it encounters. Symbols refers to both variables and functions. The answer to This question explains the symbol table.
This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand.
The point to note
If you call a function from your code, the compiler doesn't put the
final address of the routine in the object file. Instead, it puts a
placeholder value into the code and adds a note that tells the linker
to look up the reference in the various symbol tables from all the
object files it's processing and stick the final location there.
The generated object files are processed by the linker that will fill out the blanks in symbol tables, link one module to the other and finally give the executable code which can be loaded by the loader.
So in your specific case -
DefinedIncorrectFunction() - The compiler gets the definition of the function and begins compiling it to make the object code and insert appropriate reference into Symbol Table. Compilation fails due to syntax error, so Compiler aborts with an error.
NonDefinedFunction() - The compiler gets the declaration but no definition so it adds an entry to symbol table and flags the linker to add appropriate values (Since linker will process a bunch of object files, it is possible this definitionis present in some other object file). In your case you do not specify any other file, so the linker aborts with an undefined reference to NonDefinedFunction error because it can't find the reference to the concerned symbol table entry.
To understand it further lets say your code is structured as following
File- try.h
#include<string>
#include<iostream>
class Test {
private:
int i;
public:
Test(int val) {i=val ;}
void DefinedCorrectFunction(int val);
void DefinedIncorrectFunction(int val);
void NonDefinedFunction(int val);
template <class paramType>
void FunctionTemplate (paramType val) { i = val; }
};
File try.cpp
#include "try.h"
void Test::DefinedCorrectFunction(int val)
{
i = val;
}
void Test::DefinedIncorrectFunction(int val)
{
i = val;
}
int main()
{
Test testObject(1);
testObject.NonDefinedFunction(2);
//testObject.FunctionTemplate<int>(2);
return 0;
}
Let us first only copile and assemble the code but not link it
$g++ -c try.cpp -o try.o
$
This step proceeds without any problem. So you have the object code in try.o. Let's try and link it up
$g++ try.o
try.o: In function `main':
try.cpp:(.text+0x52): undefined reference to `Test::NonDefinedFunction(int)'
collect2: ld returned 1 exit status
You forgot to define Test::NonDefinedFunction. Let's define it in a separate file.
File- try1.cpp
#include "try.h"
void Test::NonDefinedFunction(int val)
{
i = val;
}
Let us compile it into object code
$ g++ -c try1.cpp -o try1.o
$
Again it is successful. Let us try to link only this file
$ g++ try1.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
No main so won';t link!!
Now you have two separate object codes that have all the components you need. Just pass BOTH of them to linker and let it do the rest
$ g++ try.o try1.o
$
No error!! This is because the linker finds definitions of all the functions (even though it is scattered in different object files) and fills the blanks in object codes with appropriate values
I believe this is your question:
Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.
The compiler tried to parse DefinedIncorrectFunction (because you provided a definition in this source file) and there was a syntax error (missing semicolon). On the other hand, the compiler never saw a definition for NonDefinedFunction because there simply was no code in this module. You might have provided a definition of NonDefinedFunction in another source file, but the compiler doesn't know that. The compiler only looks at one source file (and its included header files) at a time.
Say you want to eat some soup, so you go to a restaurant.
You search the menu for soup. If you don't find it in the menu, you leave the restaurant. (kind of like a compiler complaining it couldn't find the function) If you find it, what do you do?
You call the waiter to go get you some soup. However, just because it's in the menu, doesn't mean that they also have it in the kitchen. Could be an outdated menu, it could be that someone forgot to tell the chef that he's supposed to make soup. So again, you leave. (like an error from the linker that it couldn't find the symbol)
Compiler checks that the source code is language conformant and adheres to the semantics of the language. The output from compiler is object code.
Linker links the different object modules together to form a exe. The definitions of functions are located in this phase and the appropriate code to call them is added in this phase.
The compiler compiles code in the form of translation units. It will compile all the code that is included in a source .cppfile,
DefinedIncorrectFunction() is defined in your source file, So compiler checks it for language validity.
NonDefinedFunction() does have any definition in the source file so the compiler does not need to compile it, if the definition is present in some other source file, the function will be compiled as a part of that translation unit and further the linker will link to it, if at linking stage the definition is not found by the linker then it will raise a linking error.
What the compiler does, and what the linker does, depends on the
implementation: a legal implementation could just store the tokenized
source in the “compiler”, and do everything in the linker.
Modern implementations do put off more and more to the linker, for
better optimization. And many early implementations of templates didn't
even look the template code until link time, other than matching braces
enough to know where the template ended. From a user point of view,
you're more interested in whether the error “requires a
diagnostic” (which can be emitted by the compiler or the linker)
or is undefined behavior.
In the case of DefinedIncorrectFunction, you have provides source text
which the implementation is required to parse. That text contains a
error for which a diagnostic is required. In the case of
NonDefinedFunction: if the function is used, failure to provide a
definition (or providing more than one definition) in the complete
program is a violation of the one definition rule, which is undefined
behavior. No diagnostic is required (but I can't imagine an
implementation that didn't provide one for a missing definition of a
function that was used).
In practice, errors which can be easily detected simply by examining the
text input of a single translation unit are defined by the standard to
“require a diagnostic”, and will be detected by the
compiler. Errors which cannot be detected by the examination of a
single translation unit (e.g. a missing definition, which might be
present in a different translation unit) are formally undefined
behavior—in many cases, the errors can be detected by the linker,
and in such cases, implementations will in fact emit an error.
This is somewhat modified in cases like inline functions, where you're
allowed to repeat the definition in each translation unit, and extremely
modified by templates, since many errors cannot be detected until
instantiation. In the case of templates, the standard leaves
implementations a great deal of freedom: at the least, the compiler must
parse the template enough to determine where the template ends. The
standard added things like typename, however, to allow much more
parsing before instantiation. In dependent contexts, however, some
errors cannot possibly be detected before instantiation, which may take
place at compilation time or at link time—early implementations
favored link time instantiation; compile time instantiation dominates
today, and is used by VC++ and g++.
The missing semi-colon is a syntax error and therefore the code should not compile. This might happen even in a template implementation. Essentially, there is a parsing stage and whilst it is obvious to a human how to "fix and recover" a compiler doesn't have to do that. It can't just "imagine the semi-colon is there because that's what you meant" and continue.
A linker looks for function definitions to call where they are required. It isn't required here so there is no complaint. There is no error in this file as such, as even if it were required, it might not be implemented in this particular compilation unit. The linker is responsible for collecting together different compilation units, i.e. "linking" them.
Ah, but you could have NonDefinedFunction(int) in another compilation unit.
The compiler produces some output for the linker that basically says the following (among other things):
Which symbols (functions/variables/etc) are defined.
Which symbols are referenced but undefined. In this case the linker needs to resolve the references by searching through the other modules being linked. If it can't, you get a linker error.
The linker is there to link in code defined (possibly) in external modules - libraries or object files you will use together with this particular source file to generate the complete executable. So, if you have a declaration but no definition, your code will compile because the compiler knows the linker might find the missing code somewhere else and make it work. Therefore, in this case you will get an error from the linker, not the compiler.
If, on the other hand, there's a syntax error in your code, the compiler can't even compile and you will get an error at this stage. Macros and templates may behave a bit differently yet, not causing errors if they are not used (templates are about as much as macros with a somewhat nicer interface), but it also depends on the error's gravity. If you mess up so much that the compiler can't figure it out where the templated/macro code ends and regular code starts, it won't be able to compile.
With regular code, the compiler must compile even dead code (code not referenced in your source file) because someone might want to use that code from another source file, by linking your .o file to his code. Therefore non-templated/macro code must be syntactically correct even if it is not directly used in the same source file.

C++ header file question

I was trying out some c++ code while working with classes and this question occurred to me and it's bugging me a little.
I have created a header file that contains my class definition and a cpp file that contains the implementation.
If I use this class in a different cpp file, why am I including the header file instead of the cpp file that contains the class implementations?
If I include the class implementation file, then the class header file should be imported automatically right (since i've already included the header file in the implementation file)? Isn't this more natural?
Sorry if this is a dumb question, i'm genuinely interested in knowing why most people include .h instead of .cpp files when the latter seems more natural (I know python somewhat, maybe that's why it seems natural to me atleast). Is it just historical or is there a technical reason concerning program organisation or maybe something else?
Because when you're compiling another file, C++ doesn't actually need to know about the implementation. It only needs to know the signature of each function (which paramters it takes and what it returns), the name of each class, what macros are #defined, and other "summary" information like that, so that it can check that you're using functions and classes correctly. The contents of different .cpp files don't get put together until the linker runs.
For example, say you have foo.h
int foo(int a, float b);
and foo.cpp
#include "foo.h"
int foo(int a, float b) { /* implementation */ }
and bar.cpp
#include "foo.h"
int bar(void) {
int c = foo(1, 2.1);
}
When you compile foo.cpp, it becomes foo.o, and when you compile bar.cpp, it becomes bar.o. Now, in the process of compiling, the compiler needs to check that the definition of function foo() in foo.cpp agrees with the usage of function foo() in bar.cpp (i.e. takes an int and a float and returns an int). The way it does that is by making you include the same header file in both .cpp files, and if both the definition and the usage agree with the declaration in the header, then they must agree with each other.
But the compiler doesn't actually include the implementation of foo() in bar.o. It just includes an assembly language instruction to call foo. So when it creates bar.o, it doesn't need to know anything about the contents of foo.cpp. However, when you get to the linking stage (which happens after compilation), the linker actually does need to know about the implementation of foo(), because it's going to include that implementation in the final program and replace the call foo instruction with a call 0x109d9829 (or whatever it decides the memory address of function foo() should be).
Note that the linker does not check that the implementation of foo() (in foo.o) agrees with the use of foo() (in bar.o) - for example, it doesn't check that foo() is getting called with an int and a float parameter! It's kind of hard to do that sort of check in assembly language (at least, harder than it is to check the C++ source code), so the linker relies on knowing that the compiler has already checked that. And that's why you need the header file, to provide that information to the compiler.
The magic is done by the linker. Every .cpp when compiled will generate an intermediate object file with all the exported and imported symbols in a table. The linker will reconcile them. In other words, you just have to include the header, and every time you will reference the included class, the compiler will put the signature of the referenced class in the symbol table.
If you include the .cpp file, you will have the same code compiled twice and you will get linking errors, as the same symbol will be found twice by the linker and hence it will be ambiguous.
One technical reason is compilation speed. Let's suppose your class uses 10 other classes (e.g. as types for member variables). Including the long .cpp files for all 10 classes would make your class compile much slower (i.e. maybe 2 seconds instead of 1 second).
Another reason is hiding the implementation. Let's suppose you are writing a class to be used by 10 other teams in your company. All they have to know and learn about your class is in the .h file (public interface). You can freely do whatever you want in the .cpp file (implementation), you may change it as often you want, they won't care. But if you change the .h file, they may have to adjust their code using your class.
For each method body, it's your choice whether to put it to the .h file or to the .cpp file. If it's in the .h file, the compiler can inline it when called, which may make the code a bit faster. But compilation will be slower, and the temporary .o (.obj) files may become larger (because each of them will contain the compiled method body), and the program binary (.exe) may become larger, because the function body takes space as many times it is inlined.