Do c++ compilers skip parts of files when compiling? [duplicate] - c++

This question already has answers here:
Why have header files and .cpp files? [closed]
(9 answers)
Closed 7 years ago.
In C++, I've heard that you need to separate the implementation from the interface in order to reduce compile times. In this question, the answerer basically says that if you change the header then the source is recompiled, so if you separate the interface from the implementation then there is less chance you will need to change the header. But here are the two scenarios I see:
Have all code in header file
Whenever you change any code, the entire file is recompiled.
Have code in separate header files and source files
Since the header file is usually copy and pasted (via include) into the source file, if you make a change to either the header or source files the whole file will still have to be recompiled because it is as if they are one file.
Whether you change either the header or the source file, the whole file (both the header and the source) are recompiled.
So then what is the advantage of separating interface from implementation? You might say it is so that you can have multiple implementations for a single interface by means of inheritance, but can't you still do that when you have all the code in the header file? You could also say that it is so that the interface does not see any implementation details, but what is the use of that?
The only reason I can see why separating interface from implementation is that if the C++ compiler skips over the part of the source file that is the header file that is copy and pasted into the source code if it is not changed and just compiles the rest of the source file. So is this the case? Do C++ compilers skip compilation of certain parts of files that have not changed? I know that that is probably not what happens, but I cannot think of any other explanation.
Edit
I've already seen this question. What I am asking is why are the compilation times faster with separate implementation and interface files. I understand that there are advantages with separate implementation and interface files such as what Lightness Races in Orbit stated, but I am asking is why are compile times better if they are at all.

You missed one crucial fact: you should be minimising changes to header files.
The headers contain your interface, which should remain unchanged for as long as possible to keep your code stable. Most of your changes going forwards ought to be in the implementation files, which may be recompiled in isolation.
The real benefit of separating interface from implementation is so that users of your code/library can be given a distributable that contains your header files, allowing them to interface with your code. They do not need to handle, manage, build etc the far more bulky implementation code. If you are running Linux, look at your /usr/include and /usr/lib folders and you'll see what I mean — all the implementation can be shipped as a compiled binary, which is far less unwieldy (and harder to modify in-place).
Frankly this seems like common sense to me. When you buy a product, you receive with it an instruction manual, not the steps to manufacture it.
It's also next to impossible to mock out the implementation for unit tests if you lump everything together into a single translation unit. The C compilation model was simply not designed for that.

Related

Error LNK2005 and error LNK1169 multiple definition in Visual Studio 16.7.0 2019 Release x64 Win10 Pro 2004 [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!

Multiple definition of `Node::Node(int, std::string, std::string, std::string, float)' [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!

How does including the header automatically define the class in to the driver (main)? [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!

Coding C++ (mostly) in header files vs .cpp files

For years I've been coding C++ in the standard way, with class declarations in header files .hpp and the function definitions in source .cpp files. Recently I moved to a new company where the code (seemingly influenced by boost coding styles) is entirely coded in .hpp files with one short .cpp file to include the header files and create the object/program binary.
It got me thinking - what are the strengths/weaknesses of writing your code in header files as opposed to writing a .hpp & .cpp file for each object? This assumes our project doesn't create common libraries that are then linked into the program binaries, but instead each program binary is built from the sum of the header files (and one source .cpp file). Is this a new trend in C++?
E.g. Template objects need to be header only, but it could seem a good idea to put non-template classes into header files and then simply include these common project classes in your binary(s). Assuming you're creating a new codebase from scratch, would it mean less linking, which might mean less linking errors and possibly faster builds. Would pre-compiled headers facilities also mean using header files speeds up build time? Or are build times longer because we now need to compile all code when creating a binary rather than linking common shared library objects?
Also note we're not writing an API here (in which case something like the pimpl idiom would give us more flexibility by hiding implementation), we're writing programs to run on a customer site.
Thanks in advance,
Off the top of my head:
Strengths:
implementation visible (more of a weakness, but depends on the case)
no need to export to a library
better chance for the compiler to optimize some of the code
Weaknesses:
implementation visible
slower build time
bloated header files
a change in implementation would require a full rebuild, having the implementation in an implementation file does not (only compile that specific file or library)
In case of circular dependencies, you use forward-declarations and only include the full type in the implementation file. If all you have is a header, this is no longer possible.
I'm sure there are others, I'll edit if I can think of any more.
Header only libraries tend to make the build system easier, and generally you need to care less about dependencies.
On the other hand moving code to implementation files make it easier to control module boundaries and create exchangeable binary modules, which can improve incremental builds. The cost is more "housekeeping".
My hunch is to prefer implementation files, and there are some data points that back me up, like this blog post from the Author of the proposed Boost Networking Library.
Common libraries are one thing, general reusability of code is another. If you want to use some of the code you've written in another project, you may have to copy'n'paste huge chunks of code and then maintain the separate codebases. The compile time will get longer, because the program will be a one big compile unit as opposed to many cpp/h files, where only these, who include modified header file will be recompiled. For instance, full build of application I'm currently working on takes 7 minutes. A recompilation takes ~ fifteen seconds, if the changes are not severe. Finally, the code may tend to be less readable. Header file gives you quick glance of what the class is created for and how to use it. If classes are written inplace, you'll have to unnecessarily dig through the source code.

Using C++ headers (.h) vs headers plus implementation (.h + .cpp), what are the disadvantages?

As a novice C++ programmer I have always put my classes interface in .h files and implementation in .cpp files. However I have recently tried C# for a while and I really like its clean syntax and way to organize files, in particular there is no dinstinction between headers and implementation, you usually implement a class for each .cs file and you don't need headers.
I know that in C++ this is also possible (you can code "inline" functions in .h files), but up to now I have always seen a clear distinction between .h and .cpp files in C++ projects. What are the advantages and disadvantages of this approach?
Thank you
There's a few ways that separating the two help in C++. Firstly, if you'd like to update a library without changing an interface then having the code in the C++ file means that you simply can update the library rather than the library plus the headers. Secondly it hides the implementation. That is, it forces people to look at your class only in terms of the interface, the thing that should concern them if the code is well written. Finally, there's a sort of asthetic cleanness with interface + documentation that comes with this separation. It's something you have to get used to but after a while it'll feel natural (opinion.)
Don't forget build times.
Putting implementation code in header files makes them more likely to be changed. And changing header files will cause rebuilds of all the CPP files that include them, which in turn increases build times. This can be significant in larger projects.
I am also a fan of keeping the implementation hidden from users of my libraries. Unfortunately this doesn't work for template classes.
My rule of thumb: keep declarations in .H files, keep definitions in .CPP files.
it's cooler to have the symbols defined at one place for the case you wanted to compound C++ with already compiled binaries (typicly when using a library). imagine you need to define external symbols for global stuff in your binaries. if you had .cpp and .h code in the same file you would have to define the symbols for your binaries for every such file. in two files way you could have just the one .h with definitions for binaries and a lot of .cpp files that use it.
The main difference is that something implemented inside a .h file will be placed in every compilation unit that includes that header, this will create redundancy during the compile phase in the final binary executable.. while splitting with .h and .cpp will compile it in a single object file that is later linked against the other objects files by having just one compiled binary code that implements that header file.
In addition if you declare things just inside a .h you are not able to share variables and structures between more other .cpp files..
It's interesting to note that C# seems to be going in the C/C++ direction to some extent recently, with the introduction of partial classes.
The particular advantage of this in the IDE is that the Visual Studio designer will modify the part of the class that deals with visual controls, or data members, and their layout without any worries about mucking up the methods (application logic) that reside in a separate file.
I would echo #wheaties and add a few further items
Compilation is easier (may be it's just me), I've never been able to get compilation to work just right if you modify the header only (as in all the implementation files that have included it). I believe in Makefiles you have to add the dependencies manually which is a real pain in very large scale projects (again could just be me). So if you have your code in implementation files, then changes simply mean recompiling that particular file - very useful when you want to do quick changes, build and test.
Let me re-iterate the hiding aspect, most often you don't want people to know the implementation details due to the sensitive nature of the code, and thus only expose the headers plus the pre-built libraries, and the separation is key here.
Forward declarations, neat trick where you don't need to include the implementation details of a class in the header file if it's not being "used" in any of the code in the header, but then in the implementation file you can include the real header and "it all works nicely" (helps if you have cyclic dependencies - why you have them is different issue!)
On a recent large project the authors of systems I wanted to use had placed a lot of the code in .h files. When including their .h files into my own source it added further dependencies to my file. After including the dependencies for their project I ended up with typedef collisions. If they had separated the code and only placed declarations in the .h file it would have been much simpler. I suggest using posix types and only putting declarations into .h files.
I see that a lot of responses advocate separation, primarily for build-time and implementation hiding benefits. Both are definitely pluses, though I'll argue the counter example: Boost.
Most Boost libraries use a .hpp file with no external linking. The reason is that this is often required in the case of templates, when the compiler must know the argument types from the calling routine. So you might not have a choice if you want to stick with the "modern" C++ approach of shunning classes for templates.
As for the comparison part of .cs versus .cpp/.h I think you need to keep in mind the background the lead architect of C#: Anders Hejlsberg. In Delphi you also don't have the distinction of header and module (ignoring include files for this discussion). You simply have two sections in a unit file initialization and implementation.
The other points were already mentioned.