What are the C++ coding and file organization guidelines you suggest for people who have to deal with lots of interdependent classes spread over several source and header files?
I have this situation in my project and solving class definition related errors crossing over several header files has become quite a headache.
Some general guidelines:
Pair up your interfaces with implementations. If you have foo.cxx, everything defined in there had better be declared in foo.h.
Ensure that every header file #includes all other necessary headers or forward-declarations necessary for independent compilation.
Resist the temptation to create an "everything" header. They're always trouble down the road.
Put a set of related (and interdependent) functionality into a single file. Java and other environments encourage one-class-per-file. With C++, you often want one set of classes per file. It depends on the structure of your code.
Prefer forward declaration over #includes whenever possible. This allows you to break the cyclic header dependencies. Essentially, for cyclical dependencies across separate files, you want a file-dependency graph that looks something like this:
A.cxx requires A.h and B.h
B.cxx requires A.h and B.h
A.h requires B.h
B.h is independent (and forward-declares classes defined in A.h)
If your code is intended to be a library consumed by other developers, there are some additional steps that are important to take:
If necessary, use the concept of "private headers". That is, header files that are required by several source files, but never required by the public interface. This could be a file with common inline functions, macros, or internal constants.
Separate your public interface from your private implementation at the filesystem level. I tend to use include/ and src/ subdirectories in my C or C++ projects, where include/ has all of my public headers, and src/ has all of my sources. and private headers.
I'd recommend finding a copy of John Lakos' book Large-Scale C++ Software Design. It's a pretty hefty book, but if you just skim through some of his discussions on physical architecture, you'll learn a lot.
Check out the C and C++ coding standards at the NASA Goddard Space Flight Center. The one rule that I specially noted in the C standard and have adopted in my own code is the one that enforces the 'standalone' nature of header files. In the implementation file xxx.cpp for the header xxx.h, ensure that xxx.h is the first header included. If the header is not self-contained at any time, then compilation will fail. It is a beautifully simple and effective rule.
The only time it fails you is if you port between machines, and the xxx.h header includes, say, <pqr.h>, but <pqr.h> requires facilities that happen to be made available by a header <abc.h> on the original platform (so <pqr.h> includes <abc.h>), but the facilities are not made available by <abc.h> on the other platform (they are in def.h instead, but <pqr.h> does not include <def.h>). This isn't a fault of the rule, and the problem is more easily diagnosed and fixed if you follow the rule.
Check the header file section in Google style guide
Tom's answer is an excellent one!
Only thing I'd add is to never have "using declarations" in header files. They should only be allowed in implementation files, e.g. foo.cpp.
The logic for this is well described in the excellent book "Accelerated C++" (Amazon link - sanitised for script kiddie link nazis)
One more point in addition to the others here:
Don't include any private definitions
in an include file. E.g. any
definition that is only used in
xxx.cpp should be in xxx.cpp, not
xxx.h.
Seems obvious, but I see it
frequently.
I'd like to add one very good practice (both in C and C++) which is often forsaken :
foo.c
#include "foo.h" // always the first directive
Any other needed headers should follow, then code. The point is that you almost always need that header for this compilation unit anyway and including it as a first directive warrants the header remains self-sufficient (if it is not, there will be errors). This is especially true for public headers.
If at any point you need to put something before this header inclusion (except comments of course), then it is likely you're doing something wrong. Unless you really know what you are doing... which leads to another more crucial rule => comment your hacks !
Related
I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
where should “include” be put in C++
Obviously, there are two "schools of thought" as to whether to put #include directives into C++ header files (or, as an alternative, put #include only into cpp files). Some people say it's ok, others say it only causes problems. Does anybody know whether this discussion has reached a conclusion what is to be preferred?
I am not aware of any schools of thoughts concerning this. Put them in the header when they are needed there, otherwise forward declare and put them in the .cpp files that require them. There is no benefit in including headers where they are not needed.
What I found effective is following a few simple rules:
Headers shall be self-sufficient, i.e., they shall declare classes they need names for and include headers for any definition they use.
Headers should minimize dependencies as much as possible without violation the previous point.
Getting the first point rught is fairly easy: Include the header first thing from the source implementing what it declares. Getting the second point exactly right isn't trivial, though, and I think it requires tool support to get it exactly right. However, a few unnecessary dependencies generally aren't that bad.
As a rule of thumb, you don't include the headers in a header as long as full definition of them is necessary there. Most of the time you play around with pointers of classes in a header file so it's just fine to forward declare them there.
I think the issue was settle a long time ago: headers should be self-contained (that is should not depend on the user to have included other headers before -- that aspect is settle for so long that some aren't even aware there was a debate on this, but your put includes only in .cpp seems to hint at this) but minimal (i.e. should not include definitions when a declaration would be enough for self-containment).
The reason for self-containment is maintenance: should an header be modified and now depend on something new, you'd have to track all the place it is used to include the new dependency. BTW, the standard trick to ensure self-containment is to include the header providing the declarations for things defined in a .cpp first in the .cpp.
These are not schools of thought so much as religions. In reality, both approaches have their advantages and disadvantages, and there are certain practices to be followed for either approach to be successful. But only one of these approaches will "scale" to large projects.
The advantage of not including headers inside headers is faster compilation. However, this advantage does not come from headers being read only once, because even if you include headers inside headers, smart compilers can work that out. The speed advantage comes from the fact that you include only those headers which are strictly necessary for a given source file. Another advantage is that if we look at a source file, we can see exactly what its dependencies are: the flat list of header files gives that to us plainly.
However, this practice is hard to maintain, especially in large projects with many programmers. It's quite an inconvenience when you want to use module foo, but you cannot just #include "foo.h": you need to include 35 other headers.
What ends up happening is this: programmers are not going to waste their time discovering the exact, minimal set of headers that they need just to add module foo. To save time, they will go to some example source file similar to the one they are working on, and cut and paste all of the #include directives. Then they will try compiling it, and if it doesn't build, then they will cut and paste more #include directives from yet elsewhere, and repeat that until it works.
The net result is that, little by little, you lose the advantage of faster compiling, because your files are now including unnecessary headers. Moreover, the list of #include directives no longer shows the true dependencies. Moreover, when you do incremental compiles now, you compile more than is necessary due to these false dependencies.
Once every source file includes nearly every header, you might as well have a big everything.h which includes all the headers, and then #include "everything.h" in every source file.
So this practice of including just specific headers is best left to small projects that are carefully maintained by a handful of developers who have plenty of time to maintain the ethic of minimal include dependencies by hand, or write tools to hunt down unnecessary #include directives.
Looking around at different code bases I see a variety of styles:
Class "interfaces" defined in header file and the actual impl in a cpp file. In this approach the headers look well defined and easy to read but the cpp files look confusing as it's just a list of methods.
The second approach i see is just to put everything in a single class cpp file. These class files contain the definition and actual method impls in the body of the class definition. This approach looks better to me (more like Java and c#).
Which style should I be using?
For all but the simplest programs, style #2 is simply impossible. If you #include a .cpp file with function definitions from multiple other .cpp files, the definitions get added to multiple object files (.o / .obj) and the linker will complain about clashing symbols.
Use style #1 and learn to live with the confusion.
The former - interfaces in header files and class bodies in implementation files. You'll find this causes you fewer problems when working on large systems.
In C++ why have header files and cpp files?
C++ doesn't use "interfaces" they use classes - base/derived classes. I use one file to define class/and its implementation methods if the project is small and separate files if the project is large.
In java, I pack them up into one package then import it once in need.
Since you tagged with c++, go for first style. I don't find it confusing, for a Java programmer, it may seem different, but in C++, you are always going to use this approach.
In fact in my favorite IDE (MSVS), I open header file, and cpp file side by side. Makes looking up prototypes, and class declaration easy.
And when you have a dozen classes; a dozen .h files, and another dozen .cpp file, will make your work simpler. Because, when you want just to see, what a class does, you just open relevant .h file, and take a look at class members, and maybe short comments. You don't need to wade through several lines deep code.
Conclusion : The style options you gave, are option only for a small code, typically single file, with very few methods etc. Otherwise, it is not even a option. (#Thomas has given the reason why #2 is not even a option)
Header (HPP):
The header includes the declarations of your code, particularly function declarations. Technically speaking classes are defined in header-files, but again, the member functions are just declared.
Code in other files will include just this header and retain all necessary information from there.
Implementation (CPP):
The implementation includes the definition of functions, member-functions and variables.
Rationale:
Header-files gives a developer (a external user of your code) a plain overview and just offers the external available code (i.e. easy to read, only the information necessary for users).
Header-files allow the compiler to check the implementation for correctness
Header-files allow the compiler to check external code for correctness
Header-files allow for seperate-compilation. You need to keep in mind. that in former times, computers doesn't have enough resources to keep everything in main-memory during a compilation process. Header files are small, while implementation files are big.
Use #style 1, even for simple programs. So you can learn easily to work with. That maybe look outated today, especially in background of modern Multi-Pass-Compilers. But seperate header-files are even today beneficial. Rumours about the next C++-Standard appeared, as far as I know something like symbol export ( Java or C#) will be possible. But don't nail me down on this!
Notes:
- member-functions which are defined inside a class are by default inline, normally you don't want this
- use always defined guards
If you are developing large project, you'll find the first approach helps you a lot. The second approach may help you in small project. As your project becomes larger, management of complexity is a big issue of software development, and the first approach turns out to be a better choice.
What I do is:
write .cpp files, with the method names prefixed with the class name
in the .h file, create an empty class, with the appropriate name, then use a cogapp generator script, cog_addheaders.py, to insert the declarations, eg:
.cpp file: WeightsPersister.cpp
.h file: WeightsPersister.h
This way I get:
fast compilation (just needs to recompile the .cpp file, unless I change the class interface)
few issues with circular declarations
acceptably low tedious mindless manual work :-)
I've seen fairly consistent advice that an implementation file (.cc / .cpp) should include its corresponding class definition file first, before including other header files. But when the topic shifts to header files themselves, and the order of includes they contain, the advice seems to vary.
Google coding standards suggest:
dir2/foo2.h (preferred location — see details below).
C system files.
C++ system files.
Other libraries' .h files.
Your project's .h files.
It is unclear what the difference is between entry 1 and 5 above, and why one or the other location would be chosen. That said, another online guide suggests this order (found in the "Class Layout" section of that doc):
system includes
project includes
local includes
Once again there is an ambiguity, this time between items 2 and 3. What is the distinction? Do those represent inter-project and intra-project includes?
But more to the point, it looks as if both proposed coding standards are suggesting "your" header files are included last. Such advice, being backwards from what is recommended for include-ordering in implementation files, is not intuitive. Would it not make sense to have "your" header files consistently listed first - ahead of system and 3rd party headers?
The order you list your includes shouldn't matter from a technical point of view. If you designed it right, you should be able to put them in any order you want and it will still work. For example, if your foo.h needs <string>, it should be included inside your foo.h so you don't have to remember that dependency everywhere you use foo.
That being said, if you do have order dependencies, most of the time putting your definition file last will fix it. That's because foo.h depends on <string>, but not the other way around.
You might think that makes a good case for putting your definition file last, but it's actually quite the opposite. If your coding standards require the definition first, your compiler is more likely to catch incorrect order dependencies when they are first written.
I'm not aware of any verbatim standard but as a general rule of thumb include as little headers as possible especially within other header files to reduce compile times, conflicts, and dependencies. I'm a fan of using forward declaration of classes in header files and only including the header and definition on the .cpp side whenever I can afford to do so.
That said my personal preference is below:
For Headers:
C++ headers
3rd party headers
other project headers
this project's headers
For Source:
precompiled header file
this source file's header
C++ headers
3rd party headers
other project headers
this project's headers
Pointers or suggestions are usually to avoid conflicts and circular references, otherwise it's all personal preference or whatever policy you prefer adhere to for collaborative projects.
Regarding Google's style:
There is no ambiguity, at all.
The first header included should be the header related to this source file, thus in position 1. This way you make sure that it includes anything it needs and that there is no "hidden" dependency: if there is, it'll be exposed right away and prevent compilation.
The other headers are ordered from those you are the least likely to be able to change if an issue occurs to those you are the more likely to. An issue could be either an identifier clash, a macro leaking, etc...
By definition the C and C++ systems headers are very rarely altered, simply because there's so many people using them, thus they come second.
3rd party code can be changed, but it's generally cumbersome and takes time, thus they come third.
The "project includes" refer to project-wide includes, generally home-brawn libraries (middle-ware) that are used by several projects. They can be changed, but this would impact the other projects as well, they come fourth.
And finally the "local includes", that is those files who are specific to this project and can be changed without affecting anyone else. In case of issue, those are prime candidates, they come last.
Note that you can in fact have many more layers (especially in a software shop), the key idea is to order the dependencies starting from the bottom layer (system libs) to the top layer.
Within a given layer, I tend to organize them by alphabetical order, because it's easier to check them.
For Headers:
this project's headers
other project headers
3rd party headers
C++ headers
For Source:
this source file's header
this project's headers
other project headers
3rd party headers
C++ headers
This orders minimize the chance to MISS some required header inside .hpp file. Also it minimize INTERSECTIONS of 3rd party headers etc. And all hpp modules compiles with minimum required dependensis.
for example:
->test.hpp
// missing #include <string> header
void test(std::string& s);
->test.cpp
#include <string>
#include "test.hpp"
// we hide bug with missing required header
->test2.cpp
#include "test.hpp"
#include <string>
// compilation error with missing header
I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...