Why put function declarations and definitions in separate files? - c++

In codeacademy's lessons on functions, they teach you to use three files if you are going to call functions in your program:
the int main() file, which i've found through trial and error is an indispensable part of the program part of a c++ program (i guess...), with a .cpp file extension
a header file for DECLARING functions, with a .hpp file extension.
a separate file with function DEFINITIONS, with a .cpp file extension
Would it work to both declare and define functions within the header file by itself and simply include them above int main()? To me having seperate files for declarations and definitions just seems like it would confuse matters in a larger project.

In a large project you often only need type and function declarations, not the definitions. For example in other header files. If all definitions were in the headers, then the combined result of including multiple other headers and their transitive includes would lead to huge compilation units. This significantly hurts compile times since the amount of code the compiler needs to process would explode to orders of magnitudes more than needed. It would also hurt link times since the linker would have more work to do in discarding duplicates included in many more object files.
You also easily run into ODR (One Definition Rule) issues unless everything is marked inline.

In large projects, function declarations may be needed by many files, but the function definition should only be compiled once. It is combined with all the places that need it at link time.

A small C++ program can be (and often is made of) a single translation unit of e.g. a few thousand lines of C++ code. In that case, you could have a single myprog.cc C++ source files (with several #include-s inside).
But when you work on a larger program, in teams, it is convenient to have several C++ source files.
Some C++ files are generated by another program (this is called metaprogramming or source to source compilation) and could have a million lines of C++ lines. ANTLR or GNU Bison or TypeScript2Cxx are capable of generating C++ code.
But if you work in a team of people like Alice and Bob, it is convenient to decide that Alice is responsible of alice.cc and Bob is writing bob.cc, and both cooperate on a common header file header.hh which is #include-d in both alice.cc and bob.cc. That header.hh would practically define the API of the software project.
Read more about version control systems (I prefer git) and build automation tools (such as ninja or make).
Look for inspiration inside the C++ code of existing open source projects on gitlab or github or elsewhere (in particular, inside the source code of Clang and of GCC, both being major C++ compilers).
FWIW, in GCC 10.1 (of may, 2020) the gcc/go/gofrontend/expressions.cc file is handwritten and has 19711 lines of C++ code, so nearly twenty thousands lines. They are compiled daily. I do know the people working on that, they are brilliant and nice professionals. The biggest file of FTLK 1.4 is its src/Fl_Text_Display.cxx with 4175 C++ lines.
By personal experience, you might have a single C++ function of several dozen thousands lines of C++ (this makes practical sense only when that C++ code is generated), but then the compilation time by an optimizing compiler is dissuasive. You could adapt my manydl.c program to generate C++ files (it currently generates "random" C files with functions of "tunable" size) of arbitrary size. But C++ code generated by Fluid or Qt Designer might be quite large, and C++ code generated for GUI is often made of long but conceptually simple functions.
Nothing in the C++11 standard (see n3337) requires several translation units. You might have (see sqlite for an example) a single C++ file foo.cc of a million lines. And you could generate some of the C++ source code. The Qt project, the GCC compiler. Jacques Pitrat's book on Artificial Beings: the conscience of a conscious machine ISBN 978-1848211018 explain in many pages why such an approach is worthwhile.

There's kind of two answers to this question
why would YOU the programmer want to split things out into multiple .h/.hpp and .cpp files?
I believe the answer here is it can help organization when your .cpp files become very large with a lot of code that may not be relevant to someone who needs to provide the functionality provided by the file. Here's an example:
Let's say you have some c++ code that that display images on a screen. You as the person who wants to use that code are likely interested in the functions/classes exposed by that code which let you control that functionality. Maybe the code exposes the following helpful functions:
WriteImageToScreen(int position_x, int position_y)
ClearScreen()
It can be much easier to look through a header file which only tells you what you are allowed to use rather than how all of that is implemented. It's very possible that implementing those two functions so that you can call them requires 1000s of lines of code and a bunch of variables and statements you don't care about. Not having to read that helps you focus on the important part of the code. The pieces you want to interact with.
I've presented this example as if you were calling someone else's code but the same applies for your own code. As your projects get bigger, it can be convenient to have summaries of what each functionality a file exposes.
Now that all being said, not everyone agrees this is the correct way to do things or that it is helpful.
why is it necessary for the compiler for things to be split out into multiple .h/.hpp and .cpp files?
Just in case you aren't familiar with the term, a compiler is the program which turns your source code text into a program which your computer can execute.
So why does the compiler need separate .hpp/.cpp files? Others have pretty much already hit the nail on the head with this one, but c++ compilers get confused if something is defined multiple times. If you put everything in a header file, then when you include that header in multiple files it will be defined multiple times. So in essence this circles back to the organizational question.
I have seen programmers who just have a single main file and then all code is included directly to that main file during compilation
#include "SomeFile.cpp"
#include "AnotherFile.cpp"
// ...
#include "SoManyFiles.cpp"
int main()
{
DoStuff();
}
I believe this is called a monolith build and it's not recommended.

If you have a toy project, you can.
If you have a 1,000,000 lines of code, your build times will be horrid.
C++20 introduces modules, which should make the whole issue go away.
Other languages have tools that can extract an interface from a "module".
Hopefully, when C++20 arrives, tools will become available.
The only good reason to split an interface from an implementation is if there
are multiple implementations for one interface. E.g. VHDL, and will be available in C++20 modules. The pragmatic reasons are compilation speed and legibility.

Related

how to correctly separate main.c into various .c/.h sub sources [duplicate]

I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...

Is it always a good idea to create .h and .cpp files together? [duplicate]

I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...

Error LNK2005 and error LNK1169 multiple definition in Visual Studio 16.7.0 2019 Release x64 Win10 Pro 2004 [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!

Multiple definition of `Node::Node(int, std::string, std::string, std::string, float)' [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!

How does including the header automatically define the class in to the driver (main)? [duplicate]

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.
Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.
My reasoning for #include'ing the cpp files was:
- Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file
- In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.
So what exactly did I do wrong, and why is it bad?
To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.
Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.
"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.
If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.
"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.
Unless you change something.
Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.
"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.
"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"
Hmm. You might be on to something there. Let me know how that works out for you.
This is probably a more detailed answer than you wanted, but I think a decent explanation is justified.
In C and C++, one source file is defined as one translation unit. By convention, header files hold function declarations, type definitions and class definitions. The actual function implementations reside in translation units, i.e .cpp files.
The idea behind this is that functions and class/struct member functions are compiled and assembled once, then other functions can call that code from one place without making duplicates. Your functions are declared as "extern" implicitly.
/* Function declaration, usually found in headers. */
/* Implicitly 'extern', i.e the symbol is visible everywhere, not just locally.*/
int add(int, int);
/* function body, or function definition. */
int add(int a, int b)
{
return a + b;
}
If you want a function to be local for a translation unit, you define it as 'static'. What does this mean? It means that if you include source files with extern functions, you will get redefinition errors, because the compiler comes across the same implementation more than once. So, you want all your translation units to see the function declaration but not the function body.
So how does it all get mashed together at the end? That is the linker's job. A linker reads all the object files which is generated by the assembler stage and resolves symbols. As I said earlier, a symbol is just a name. For example, the name of a variable or a function. When translation units which call functions or declare types do not know the implementation for those functions or types, those symbols are said to be unresolved. The linker resolves the unresolved symbol by connecting the translation unit which holds the undefined symbol together with the one which contains the implementation. Phew. This is true for all externally visible symbols, whether they are implemented in your code, or provided by an additional library. A library is really just an archive with reusable code.
There are two notable exceptions. First, if you have a small function, you can make it inline. This means that the generated machine code does not generate an extern function call, but is literally concatenated in-place. Since they usually are small, the size overhead does not matter. You can imagine them to be static in the way they work. So it is safe to implement inline functions in headers. Function implementations inside a class or struct definition are also often inlined automatically by the compiler.
The other exception is templates. Since the compiler needs to see the whole template type definition when instantiating them, it is not possible to decouple the implementation from the definition as with standalone functions or normal classes. Well, perhaps this is possible now, but getting widespread compiler support for the "export" keyword took a long, long time. So without support for 'export', translation units get their own local copies of instantiated templated types and functions, similar to how inline functions work. With support for 'export', this is not the case.
For the two exceptions, some people find it "nicer" to put the implementations of inline functions, templated functions and templated types in .cpp files, and then #include the .cpp file. Whether this is a header or a source file doesn't really matter; the preprocessor does not care and is just a convention.
A quick summary of the whole process from C++ code (several files) and to a final executable:
The preprocessor is run, which parses all the directives which starts with a '#'. The #include directive concatenates the included file with inferior, for example. It also does macro-replacement and token-pasting.
The actual compiler runs on the intermediate text file after the preprocessor stage, and emits assembler code.
The assembler runs on the assembly file and emits machine code, this is usually called an object file and follows the binary executable format of the operative system in question. For example, Windows uses the PE (portable executable format), while Linux uses the Unix System V ELF format, with GNU extensions. At this stage, symbols are still marked as undefined.
Finally, the linker is run. All the previous stages were run on each translation unit in order. However, the linker stage works on all the generated object files which were generated by the assembler. The linker resolves symbols and does a lot of magic like creating sections and segments, which is dependent on the target platform and binary format. Programmers aren't required to know this in general, but it surely helps in some cases.
Again, this was definetely more than you asked for, but I hope the nitty-gritty details helps you to see the bigger picture.
The typical solution is to use .h files for declarations only and .cpp files for implementation. If you need to reuse the implementation you include the corresponding .h file into the .cpp file where the necessary class/function/whatever is used and link against an already compiled .cpp file (either an .obj file - usually used within one project - or .lib file - usually used for reusing from multiple projects). This way you don't need to recompile everything if only the implementation changes.
Think of cpp files as a black box and the .h files as the guides on how to use those black boxes.
The cpp files can be compiled ahead of time. This doesn't work in you #include them, as it needs to actual "include" the code into your program each time it compiles it. If you just include the header, it can just use the header file to determine how to use the precompiled cpp file.
Although this won't make much of a difference for your first project, if you start writing large cpp programs, people are going to hate you because compile times are going to explode.
Also have a read of this: Header File Include Patterns
Header files usually contain declarations of functions / classes, while .cpp files contain the actual implementations. At compile time, each .cpp file gets compiled into an object file (usually extension .o), and the linker combines the various object files into the final executable. The linking process is generally much faster than the compilation.
Benefits of this separation: If you are recompiling one of the .cpp files in your project, you don't have to recompile all the others. You just create the new object file for that particular .cpp file. The compiler doesn't have to look at the other .cpp files. However, if you want to call functions in your current .cpp file that were implemented in the other .cpp files, you have to tell the compiler what arguments they take; that is the purpose of including the header files.
Disadvantages: When compiling a given .cpp file, the compiler cannot 'see' what is inside the other .cpp files. So it doesn't know how the functions there are implemented, and as a result cannot optimize as aggressively. But I think you don't need to concern yourself with that just yet (:
The basic idea that headers are only included and cpp files are only compiled. This will become more useful once you have many cpp files, and recompiling the whole application when you modify only one of them will be too slow. Or when the functions in the files will start depending on each other. So, you should separate class declarations into your header files, leave implementation in cpp files and write a Makefile (or something else, depending on what tools are you using) to compile the cpp files and link the resulting object files into a program.
If you #include a cpp file in several other files in your program, the compiler will try to compile the cpp file multiple times, and will generate an error as there will be multiple implementations of the same methods.
Compilation will take longer (which becomes a problem on large projects), if you make edits in #included cpp files, which then force recompilation of any files #including them.
Just put your declarations into header files and include those (as they don't actually generate code per se), and the linker will hook up the declarations with the corresponding cpp code (which then only gets compiled once).
re-usability, architecture and data encapsulation
here's an example:
say you create a cpp file which contains a simple form of string routines all in a class mystring, you place the class decl for this in a mystring.h compiling mystring.cpp to a .obj file
now in your main program (e.g. main.cpp) you include header and link with the mystring.obj.
to use mystring in your program you don't care about the details how mystring is implemented since the header says what it can do
now if a buddy wants to use your mystring class you give him mystring.h and the mystring.obj, he also doesn't necessarily need to know how it works as long as it works.
later if you have more such .obj files you can combine them into a .lib file and link to that instead.
you can also decide to change the mystring.cpp file and implement it more effectively, this will not affect your main.cpp or your buddies program.
While it is certainly possible to do as you did, the standard practice is to put shared declarations into header files (.h), and definitions of functions and variables - implementation - into source files (.cpp).
As a convention, this helps make it clear where everything is, and makes a clear distinction between interface and implementation of your modules. It also means that you never have to check to see if a .cpp file is included in another, before adding something to it that could break if it was defined in several different units.
If it works for you then there is nothing wrong with it -- except that it will ruffle the feathers of people who think that there is only one way to do things.
Many of the answers given here address optimizations for large-scale software projects. These are good things to know about, but there is no point in optimizing a small project as if it were a large project -- that is what is known as "premature optimization". Depending on your development environment, there may be significant extra complexity involved in setting up a build configuration to support multiple source files per program.
If, over time, your project evolves and you find that the build process is taking too long, then you can refactor your code to use multiple source files for faster incremental builds.
Several of the answers discuss separating interface from implementation. However, this is not an inherent feature of include files, and it is quite common to #include "header" files that directly incorporate their implementation (even the C++ Standard Library does this to a significant degree).
The only thing truly "unconventional" about what you have done was naming your included files ".cpp" instead of ".h" or ".hpp".
When you compile and link a program the compiler first compiles the individual cpp files and then they link (connect) them. The headers will never get compiled, unless included in a cpp file first.
Typically headers are declarations and cpp are implementation files. In the headers you define an interface for a class or function but you leave out how you actually implement the details. This way you don't have to recompile every cpp file if you make a change in one.
I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.
Header files usually have interfaces which are supposed not to be changed so frequently.
Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.
I am still learning like you :)
It's like writing a book, you want to print out finished chapters only once
Say you are writing a book. If you put the chapters in separate files then you only need to print out a chapter if you have changed it. Working on one chapter doesn't change any of the others.
But including the cpp files is, from the compiler's point of view, like editing all of the chapters of the book in one file. Then if you change it you have to print all the pages of the entire book in order to get your revised chapter printed. There is no "print selected pages" option in object code generation.
Back to software: I have Linux and Ruby src lying around. A rough measure of lines of code...
Linux Ruby
100,000 100,000 core functionality (just kernel/*, ruby top level dir)
10,000,000 200,000 everything
Any one of those four categories has a lot of code, hence the need for modularity. This kind of code base is surprisingly typical of real-world systems.
There are times when non conventional programming techniques are actually quite useful and solve otherwise difficult (if not impossible problems).
If C source is generated by third party applications such as lexx and yacc they can obviously be compiled and linked separately and this is the usual approach.
However there are times when these sources can cause linkage problems with other unrelated sources. You have some options if this occurs. Rewrite the conflicting components to accommodate the lexx and yacc sources. Modify the lexx & yacc componets to accommodate your sources. '#Include' the lexx and yacc sources where they are required.
Re-writing the the components is fine if the changes are small and the components are understood to begin with (i.e: you not porting someone else's code).
Modifying the lexx and yacc source is fine as long as the build process doesn't keep regenerating the source from the lexx and yacc scripts.
You can always revert to one of the other two methods if you feel it is required.
Adding a single #include and modifying the makefile to remove the build of the lexx/yacc components to overcome all your problems is attractive fast and provides you the opportunity to prove the code works at all without spending time rewriting code and questing whether the code would have ever worked in the first place when it isn't working now.
When two C files are included together they are basically one file and there are no external references required to be resolved at link time!