I was trying out some c++ code while working with classes and this question occurred to me and it's bugging me a little.
I have created a header file that contains my class definition and a cpp file that contains the implementation.
If I use this class in a different cpp file, why am I including the header file instead of the cpp file that contains the class implementations?
If I include the class implementation file, then the class header file should be imported automatically right (since i've already included the header file in the implementation file)? Isn't this more natural?
Sorry if this is a dumb question, i'm genuinely interested in knowing why most people include .h instead of .cpp files when the latter seems more natural (I know python somewhat, maybe that's why it seems natural to me atleast). Is it just historical or is there a technical reason concerning program organisation or maybe something else?
Because when you're compiling another file, C++ doesn't actually need to know about the implementation. It only needs to know the signature of each function (which paramters it takes and what it returns), the name of each class, what macros are #defined, and other "summary" information like that, so that it can check that you're using functions and classes correctly. The contents of different .cpp files don't get put together until the linker runs.
For example, say you have foo.h
int foo(int a, float b);
and foo.cpp
#include "foo.h"
int foo(int a, float b) { /* implementation */ }
and bar.cpp
#include "foo.h"
int bar(void) {
int c = foo(1, 2.1);
}
When you compile foo.cpp, it becomes foo.o, and when you compile bar.cpp, it becomes bar.o. Now, in the process of compiling, the compiler needs to check that the definition of function foo() in foo.cpp agrees with the usage of function foo() in bar.cpp (i.e. takes an int and a float and returns an int). The way it does that is by making you include the same header file in both .cpp files, and if both the definition and the usage agree with the declaration in the header, then they must agree with each other.
But the compiler doesn't actually include the implementation of foo() in bar.o. It just includes an assembly language instruction to call foo. So when it creates bar.o, it doesn't need to know anything about the contents of foo.cpp. However, when you get to the linking stage (which happens after compilation), the linker actually does need to know about the implementation of foo(), because it's going to include that implementation in the final program and replace the call foo instruction with a call 0x109d9829 (or whatever it decides the memory address of function foo() should be).
Note that the linker does not check that the implementation of foo() (in foo.o) agrees with the use of foo() (in bar.o) - for example, it doesn't check that foo() is getting called with an int and a float parameter! It's kind of hard to do that sort of check in assembly language (at least, harder than it is to check the C++ source code), so the linker relies on knowing that the compiler has already checked that. And that's why you need the header file, to provide that information to the compiler.
The magic is done by the linker. Every .cpp when compiled will generate an intermediate object file with all the exported and imported symbols in a table. The linker will reconcile them. In other words, you just have to include the header, and every time you will reference the included class, the compiler will put the signature of the referenced class in the symbol table.
If you include the .cpp file, you will have the same code compiled twice and you will get linking errors, as the same symbol will be found twice by the linker and hence it will be ambiguous.
One technical reason is compilation speed. Let's suppose your class uses 10 other classes (e.g. as types for member variables). Including the long .cpp files for all 10 classes would make your class compile much slower (i.e. maybe 2 seconds instead of 1 second).
Another reason is hiding the implementation. Let's suppose you are writing a class to be used by 10 other teams in your company. All they have to know and learn about your class is in the .h file (public interface). You can freely do whatever you want in the .cpp file (implementation), you may change it as often you want, they won't care. But if you change the .h file, they may have to adjust their code using your class.
For each method body, it's your choice whether to put it to the .h file or to the .cpp file. If it's in the .h file, the compiler can inline it when called, which may make the code a bit faster. But compilation will be slower, and the temporary .o (.obj) files may become larger (because each of them will contain the compiled method body), and the program binary (.exe) may become larger, because the function body takes space as many times it is inlined.
Related
I have what seems a relatively simple question, but one that keeps defying my efforts to understand it.
I apologise if it is a simple question, but like many simple questions, I can't seem to find a solid explanation anywhere.
With the below code:
/*foo.c*/
#include "bar.h"
int main() {
return(my_function(1,2));
}
/*bar.h*/
int my_function(int,int);
/*bar.c*/
#include "bar.h" /*is this necessary!?*/
int my_function(int x, int y) {
return(x+y);
}
Simply, is the second inclusion necessary? I don't understand why I keep seeing headers included in both source files. Surely if the function is declared in "foo.c" by including "bar.h," it does not need to be declared a second time in another linked source file (especially the one which actually defines it)??? A friend tried to explain to me that it didn't really matter for functions, but it did for structs, something which still eludes me! Help!
Is it simply for clarity, so that programmers can see which functions are being used externally?
I just don't get it!
Thanks!
In this particular case, it's unnecessary for the reason you described. It might be useful in situations where you have a more complex set of functions that might all depend on each other. If you include the header at the top of the .cpp file, you have effectively forward-declared every single function and so you don't have to worry about making sure your function definitions are in a certain order.
I also find that it clearly shows that these function definitions correspond to those declarations. This makes it easier for the reader to find how translation units depend on each other. Of course, the names of the files might be sufficient, but some more complex projects do not have one-to-one relationship between .cpp files and .h files. Sometimes headers are broken up into multiple parts, or many implementation files will have their external functions declared in a single header (common for large modules).
Really, all inclusions are unnecessary. You can always, after all, just duplicate the declarations (or definitions, in the case of classes) across all of the files that require them. We use the preprocessor to simplify this task and reduce the amount of redundant code. It's easier to stick to a pattern of always including the corresponding header because it will always work, rather than have to check each file every time you edit them and determine if the inclusion is necessary or not.
The way the C language (and C++) is designed is that the compiler processes each .c file in isolation.
You typically launch your compiler (cl.exe or gcc, for example) for one of your c files, and this produces one object file (.o or .obj).
Once all your object files have been generated, you run the linker, passing it all the object files, and it will tie them together into an executable.
That's why every .c file needs to include the headers it depends on. When the compiler is processing it, it knows nothing about which other .c files you may have. All it knows is the contents of the .c file you point it to, as well as the headers it includes.
In your simplified example inclusion of "bar.h" in "bar.c" is not necessary. But in real world in most cases it would be. If you have a class declaration in "bar.h", and "bar.c" has functions of this class, the inclusion is needed. If you have any other declaration which is used in "bar.c" - being it a constant, enum, etc. - again include is needed. Because in real world it is nearly always needed, the easy rule is - include the header file in the corresponding source file always.
If the header only declares global functions, and the source file only implements them (without calling any of them) then it's not strictly necessary. But that's not usually the case; in a large program, you rarely want global functions.
If the header defines a class, then you'll need to include it in the source file in order to define member functions:
void Thing::function(int x) {
//^^^^^^^ needs class definition
}
If the header declares functions in a namespace, then it's a good idea to put the definitions outside the namespace:
void ns::function(int x) {
//^^^^ needs previous declaration
}
This will give a nice compile-time error if the parameter types don't match a previous declaration - for which you'd need to include the header. Defining the function inside its namespace
namespace ns {
void function(int x) {
// ...
}
}
will silently declare a new overload if you get the parameter types wrong.
Simple rule is this(Considering foo is a member function of some class):-
So, if some header file is declaring a function say:=
//foo.h
void foo (int x);
Compiler would need to see this declaration anywhere you have defined this function ( to make sure your definition is in line with declaration) and you are calling this function ( to make sure you have called the function with correct number and type of arguments).
That means you have to include foo.h everywhere you are making call to that function and where you are providing definition for that function.
Also if foo is a global function ( not inside any namespace ) then there is no need to include that foo.h in implementation file.
I already know that, when I put the definition of a member function into a header and mark the function as inline, the code in the function gets inlined into any place where the function is called out of a .cpp file, so when it comes to a compiled binary, I know where the function's code is located -- within the compiled code of any .cpp file that depends on it. But what happens if I don't mark a function in a header with inline and the function's body is large enough to make the compiler choose not to inline it? In the context of a static/dynamic library the function's class belongs to, where does the function's code is compiled to? Or is it not compiled at all and the final destination for the function's code is a compiled .cpp of a client of the library? If it's the latter case, does the function's code still gets inlined even if I didn't mark it with inline (because its code was too "heavy")? And finally, is MSVC compiler's behavior in this case differs from the GCC's one?
And sure, I realize that putting member functions I want to be inlined into .h file (or .inl file) and "heavy" function into .cpp file would make things crystal clear, but I would really like to avoid breaking a class' implementation across files, hence is the interest.
When you mark a function inline you're not forcing the compiler to inline it at every place it's called, you're just telling it that that the definition is inline and it should expect duplicate copies in different compilation units. The actual code will be compiled at least once for every compilation unit where you include the header and call the function.
If you don't declare it inline, the linker should complain about multiple definitions of the function, even if those definitions are identical.
It's compiled directly into each translation unit that includes your header. If there is more than one such file you violate the one definition rule and make your program malformed.
If you really want to put all your code in one file, put it in the header and mark the function inline. It's only a suggestion so if the function is too big, the compiler won't inline it anyway, it will be compiled exactly like a non-inline function. But note that this is not canonical C++ because it can drastically increase compilation times. The normal pattern is in fact to separate the interface (headers) from the implementation (source file(s)). If the compiler decides to not inline the function, it will be written into the compiled object file for each translation unit that includes the header, and the linker will be required to pick an instance from one of the object files, throwing away the rest (since the code for each version is identical).
As you know, "inline" is merely a "request" to the compiler - nothing more.
Moreover, there's nothing preventing you from declaring a standalone "static" function in a header. At which point the SAME binary code gets DUPLICATED in every object file whose source file #include's the header.
Guess what - the same thing can happen with inline functions :)
Personally, I like to see nothing but class and struct definitions, typedefs, constants, function prototypes ... and externs ... in a header file.
This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).
So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?
ex. Say I have 3 files
main.cpp
myfunction.cpp
myfunction.hpp
//main.cpp
#include "myfunction.hpp"
int main() {
int A = myfunction( 12 );
...
}
-
//myfunction.cpp
#include "myfunction.hpp"
int myfunction( int x ) {
return x * x;
}
-
//myfunction.hpp
int myfunction( int x );
-
I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?
I apologize if this isn't clear or I'm vastly mistaken about something, new here
The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.
The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.
In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.
So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.
Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.
You have to understand that compilation is a 2-steps operations, from a user point of view.
1st Step : Object compilation
During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.
Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.
2nd Step : Linking
During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.
And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.
It even allows to solve undefined symbol on demand, with dl* functions.
1. The principle
When you write:
int A = myfunction(12);
This is translated to:
int A = #call(myfunction, 12);
where #call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.
2. A point on ABI
How does this #call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).
Therefore, #call actually does:
#push 12, reg0
#invoke myfunction
And the function definition knows that its first argument (x) is located in reg0.
3. But I though dictionaries were for dynamic languages ?
And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.
For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:
.undefined
[0]: myfunction
Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:
those which are within the library, and can be referenced through an offset (the final address is still unknown)
those which are outside the library, and whose address is completely unknown until runtime.
Both can be treated in the same fashion.
.dynamic
[0]: myfunction at <undefined-address>
And then the code will reference the look-up entry:
#invoke .dynamic[0]
When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).
As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:
The compiler is invoked for each cpp file and translate it to an
object file (binary code) with a symbol table which associates
function name (names are mangled in c++) to their location in the
object file.
The linker is invoked only one time: whith every object file in
parameter. It will resolve function call location from one object
file to another thanks to symbol tables. One main() function MUST
exist somewhere. Eventually a binary executable file is produced
when the linker found everything it needs.
The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit).
When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.
In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.
In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.
int myfunction(int); is the function prototype. You declare function with it so that compiler knows that you are calling this function when you write myfunction(0);.
And how do the header and main function even know the function definition exists?
Well, this is the job of Linker.
When you compile a program, the preprocessor adds source code of each header file to the file that included it. The compiler compiles EVERY .cpp file. The result is a number of .obj files.
After that comes the linker. Linker takes all .obj files, starting from you main file, Whenever it finds a reference that has no definition (e.g. a variable, function or class) it tries to locate the respective definition in other .obj files created at compile stage or supplied to linker at the beginning of linking stage.
Now to answer your question: each .cpp file is compile into a .obj file containing instructions in machine code. When you include a .hpp file and use some function that's defined in another .cpp file, at linking stage the linker looks for that function definition in the respective .obj file. That's how it finds it.
I know what it means when static function is declared in source file. I am reading some code, found that static function in header files could be invoke in other files.
Is the function defined in the header file? So that the actual code is given directly in the function, like this:
static int addTwo(int x)
{
return x + 2;
}
Then that's just a way of providing a useful function to many different C files. Each C file that includes the header will get its own definition that it can call. This of course wastes memory, and is (in my opinion) a quite ugly thing to be doing, since having executable code in a header is generally not a good idea.
Remember that #include:ing a header basically just pastes the contents of the header (and any other headers included by it) into the C file as seen by the compiler. The compiler never knows that the one particular function definition came from a header file.
UPDATE: In many cases, it's actually a good idea to do something like the above, and I realize my answer sounds very black-and-white about this which is kind of oversimplifying things a bit. For instance, code that models (or just uses) intrinsic functions can be expressed like the above, and with an explicit inline keyword even:
static inline int addTwo(int *x)
{
__add_two_superquickly(x);
}
Here, the __add_two_superquickly() function is a fictional intrinsic, and since we want the entire function to basically compile down to a single instruction, we really want it to be inlined. Still, the above is cleaner than using a macro.
The advantage over just using the intrinsic directly is of course that wrapping it in another layer of abstraction makes it possible to build the code on compilers lacking that particular intrinsic, by providing an alternate implementation and picking the right one depending on which compiler is being used.
It will effectively create a separate static function with the same name inside every cpp file it is included into. The same applies to global variables.
As others are saying, it has exactly the same meaning as a static function in the .c file itself. This is because there is no semantic difference between .c and .h files; there is only the compilation unit made up of the file actually passed to the compiler (usually named .c) with the contents of any and all files named in #include lines (usually named .h) inserted into the stream as they are seen by the preprocessor.
The convention that the C source is in a file named .c and public declarations are in files named .h is only a convention. But it is generally a good one. Under that convention, the only things that should appear in .h files are declarations so that you generally avoid having the same symbol defined more than once in a single program.
In this particular case, the static keyword makes the symbol be private to the module, so there isn't a multiple-definition conflict waiting to cause trouble. So in that one sense, it is safe to do. But in the absence of a guarantee that the function would be inlined, you take the risk that the function would be instantiated in every module that happened to #include that header file which at best is a waste of memory in the code segment.
I am not certain of what use cases would justify doing this at all in a generally available public header.
If the .h file is generated code and only included in a single .c file, then I would personally name the file something other than .h to emphasize that it isn't actually a public header at all. For example, a utility that converts a binary file into an initialized variable definition might write a file that is intended to be used via #include and could very well contain a static declaration of the variable, and possibly even static definitions of accessor or other related utility functions.
If you define the function in a header file (not simply declare it), a copy of the function will be generated in each translation unit (basically in each cpp file which includes this header).
This may increase the size of your executable, but this may be negligible if the function is small. The advantage is that the most compilers may inline the function, which may increase the code performance.
But there may be a big difference in doing this which wasn't mentioned in any answer. If your function uses a static local variable such as:
static int counter()
{
static int ctr = 0;
return ctr++;
}
Rather than:
//header
int counter();
//source
int counter()
{
static int ctr = 0;
return ctr++;
}
Then each source file including this header will have its own counter. If the function is declared inside the header, and defined in a source file, then the counter will be shared across your whole program.
So saying that the only difference will be performance and code size is wrong.
There is not semantic difference in defining in source file or header file, basically both means the same in plain C when using static keyword that, you are limiting the scope.
However, there is a problem in writing this in header file, this is because every time you include the header in a source file you'll have a copy of the function with same implementation which is much similar to have a normal function defined in header file. By adding the definition in header you are not achieving the what the static function is meant for.
Therefore, I suggest you should have your implementation only in your source file and not in header.
It is usefull in some "header-only" libraries with small inline functions. In a such case you always want to make a copy of the function so this is not a bad pattern. However, this gives you an easy way to insert separate interface and implementation parts in the single header file:
// header.h
// interface part (for user?!)
static inline float av(float a, float b);
// implementation part (for developer)
static inline float av(float a, float b)
{
return (a+b)/2.f;
}
Apple vector math library in GLK framework uses such constuction (e.g. GLKMatrix4.h).
I know what it means when static function is declared in source file. I am reading some code, found that static function in header files could be invoke in other files.
Is the function defined in the header file? So that the actual code is given directly in the function, like this:
static int addTwo(int x)
{
return x + 2;
}
Then that's just a way of providing a useful function to many different C files. Each C file that includes the header will get its own definition that it can call. This of course wastes memory, and is (in my opinion) a quite ugly thing to be doing, since having executable code in a header is generally not a good idea.
Remember that #include:ing a header basically just pastes the contents of the header (and any other headers included by it) into the C file as seen by the compiler. The compiler never knows that the one particular function definition came from a header file.
UPDATE: In many cases, it's actually a good idea to do something like the above, and I realize my answer sounds very black-and-white about this which is kind of oversimplifying things a bit. For instance, code that models (or just uses) intrinsic functions can be expressed like the above, and with an explicit inline keyword even:
static inline int addTwo(int *x)
{
__add_two_superquickly(x);
}
Here, the __add_two_superquickly() function is a fictional intrinsic, and since we want the entire function to basically compile down to a single instruction, we really want it to be inlined. Still, the above is cleaner than using a macro.
The advantage over just using the intrinsic directly is of course that wrapping it in another layer of abstraction makes it possible to build the code on compilers lacking that particular intrinsic, by providing an alternate implementation and picking the right one depending on which compiler is being used.
It will effectively create a separate static function with the same name inside every cpp file it is included into. The same applies to global variables.
As others are saying, it has exactly the same meaning as a static function in the .c file itself. This is because there is no semantic difference between .c and .h files; there is only the compilation unit made up of the file actually passed to the compiler (usually named .c) with the contents of any and all files named in #include lines (usually named .h) inserted into the stream as they are seen by the preprocessor.
The convention that the C source is in a file named .c and public declarations are in files named .h is only a convention. But it is generally a good one. Under that convention, the only things that should appear in .h files are declarations so that you generally avoid having the same symbol defined more than once in a single program.
In this particular case, the static keyword makes the symbol be private to the module, so there isn't a multiple-definition conflict waiting to cause trouble. So in that one sense, it is safe to do. But in the absence of a guarantee that the function would be inlined, you take the risk that the function would be instantiated in every module that happened to #include that header file which at best is a waste of memory in the code segment.
I am not certain of what use cases would justify doing this at all in a generally available public header.
If the .h file is generated code and only included in a single .c file, then I would personally name the file something other than .h to emphasize that it isn't actually a public header at all. For example, a utility that converts a binary file into an initialized variable definition might write a file that is intended to be used via #include and could very well contain a static declaration of the variable, and possibly even static definitions of accessor or other related utility functions.
If you define the function in a header file (not simply declare it), a copy of the function will be generated in each translation unit (basically in each cpp file which includes this header).
This may increase the size of your executable, but this may be negligible if the function is small. The advantage is that the most compilers may inline the function, which may increase the code performance.
But there may be a big difference in doing this which wasn't mentioned in any answer. If your function uses a static local variable such as:
static int counter()
{
static int ctr = 0;
return ctr++;
}
Rather than:
//header
int counter();
//source
int counter()
{
static int ctr = 0;
return ctr++;
}
Then each source file including this header will have its own counter. If the function is declared inside the header, and defined in a source file, then the counter will be shared across your whole program.
So saying that the only difference will be performance and code size is wrong.
There is not semantic difference in defining in source file or header file, basically both means the same in plain C when using static keyword that, you are limiting the scope.
However, there is a problem in writing this in header file, this is because every time you include the header in a source file you'll have a copy of the function with same implementation which is much similar to have a normal function defined in header file. By adding the definition in header you are not achieving the what the static function is meant for.
Therefore, I suggest you should have your implementation only in your source file and not in header.
It is usefull in some "header-only" libraries with small inline functions. In a such case you always want to make a copy of the function so this is not a bad pattern. However, this gives you an easy way to insert separate interface and implementation parts in the single header file:
// header.h
// interface part (for user?!)
static inline float av(float a, float b);
// implementation part (for developer)
static inline float av(float a, float b)
{
return (a+b)/2.f;
}
Apple vector math library in GLK framework uses such constuction (e.g. GLKMatrix4.h).