Related
I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Why are functions usually declared in classes and not defined?:
class MyClass {
public:
MyFunction(); //Function is declared not defined
};
Instead of:
class MyClass {
public:
MyFunction() {
std::cout << "This is a function" << std::endl;
} //Function is defined instead of being declared. No need for cpp file
};
Is the reason because it looks nice or easier to read?
The general (theoretical) reason is separation of concerns: to call a function (whether a class member function or not) the compiler only needs visibility of the function's interface, based on its signature (name, return type, argument types). The caller does not need visibility of function definitions (and, if the definition is not (or cannot be) inlined for some reason, the outcome is often a program that fails to compile or link).
Two common practical reasons are to maximise benefits of separate compilation and to allow distribution of a library without source code.
Separate compilation (over-simplistically, placing subsets of source code in distinct source files) is a feature of C++, that has numerous benefits in larger projects. It involves having a set of separately compiled source files rather than throwing all the source for a program into a single source file. One benefit of separate compilation is that it enables incremental builds. Once all source files for a project have been compiled, the only source files that need to be recompiled are those that have changed (since recompiling a source file that hasn't changed produces a functionally equivalent object file). For large projects that are edited and rebuilt often, this allows incremental builds (recompiling changed source files only and relinking) instead of universal builds (which recompile and relink everything). Practically, even in moderately large projects (lets' say projects that consist of a few hundred source files) the incremental build time can be measured in minutes while the universal rebuild time can be measured in days. The difference between the two equates to unproductive time (thumb-twiddling waiting for a build to complete) by programmers. Programmer time is THE largest cost in significant projects, so unproductive programmer time is expensive.
In practice, in moderate or larger projects, function interfaces (function signatures) are quite stable (change relatively infrequently) while function definitions are unstable (change relatively frequently). Having a function definition in a header file means that header file is more likely to be edited as the program evolves. And, when a header file is edited, EVERY source file which includes that header file must be recompiled in order to rebuild (build management tools often work that way, because it's the least complicated way to detect dependencies between files in a project). That tends to result in larger build times - while the impacts of recompiling all source files that include a changed header are not as large as doing an universal build, the impacts can involve increasing the incremental build time from a few minutes to a few hours. Again, more unproductive time for programmers.
That's not saying that functions should never be inlined. It does mean that it is necessary to choose carefully which functions should be inlined (e.g. based on performance benefits identified using profiling, avoid inlining functions that will be updated regularly) rather than (as this question suggests) defaulting to inlining.
The second common reason for not inlining functions is for distribution of libraries. For commercial (e.g. protecting intellectual property) and other reasons it is often preferable to distribute a compiled library and header files (the minimum needed for someone to use a library in another project) without distributing the complete source code. Placing functions into a header file increases the amount of source file that is distributed.
For non-template code placing definitions in source files separate from the declaration allows for more efficient compilation. You have a single source file that needs to be compiled, so long as that file doesn't change the output can be reused. The final link step isn't effected very much by having multiple input files, the symbols will need to be resolved from the same data structures whether there is one input file or many. It's harder with C++ but this can even allow distribution of binary-only libraries, that is, distributing headers containing only declarations and matching object files, without the implementing source at all.
On the other hand, if functions are defined inline (whether in the class body or outside with explicit 'inline' keyword) the linker will have to check its symbol table and toss duplicates. And that applies regardless of whether talking about templates or not. But templates need their definitions to be available in every translation unit that uses the template so their definitions will generally go in headers even with the cost of pruning duplicates at the link stage.
Generally, classes are declared in header files. Writing the definition in the header would run afoul of the one-definition rule. If I have
// my_class.h
class MyClass {
public:
void myFunction() {}
};
Then, as soon as I've included this file in two different .cpp files, I have two separate definitions of MyClass.myFunction, which is an error. Thus, we declare our class methods in the header and then write the implementation in the .cpp source file.
// my_class.h
class MyClass {
public:
void myFunction();
};
// my_class.cpp
void MyClass::myFunction() {}
The rules are more complicated if the function is inline or a template function, in which case there's a compelling (and often required) reason to define the function in the header. Some people will define template functions in the class itself as you've suggested, but others prefer to write the template function definition at the bottom of the header, to be more consistent with the way we write non-template functions. It's more a matter of style in this case, so pick your preferred convention and stick to it.
Likewise, if the class is not mentioned in a header file (i.e. if the class is an implementation detail and only exists in one .cpp file), then you're free to do it either way, and you'll find folks that prefer both conventions.
in one word, seperate implemention and declearation is always good.
in a large project, implemention in header file will cause multiple definition problem.
seperate them is always good way, can avoid a lot of future problems.
I've never really understood why C++ needs a separate header file with the same functions as in the .cpp file. It makes creating classes and refactoring them very difficult, and it adds unnecessary files to the project. And then there is the problem with having to include header files, but having to explicitly check if it has already been included.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Follow up question:
How does the compiler find the .cpp file with the code in it, when all I include is the .h file? Does it assume that the .cpp file has the same name as the .h file, or does it actually look through all the files in the directory tree?
Some people consider header files an advantage:
It is claimed that it enables/enforces/allows separation of interface and implementation -- but usually, this is not the case. Header files are full of implementation details (for example member variables of a class have to be specified in the header, even though they're not part of the public interface), and functions can, and often are, defined inline in the class declaration in the header, again destroying this separation.
It is sometimes said to improve compile-time because each translation unit can be processed independently. And yet C++ is probably the slowest language in existence when it comes to compile-times. A part of the reason is the many many repeated inclusions of the same header. A large number of headers are included by multiple translation units, requiring them to be parsed multiple times.
Ultimately, the header system is an artifact from the 70's when C was designed. Back then, computers had very little memory, and keeping the entire module in memory just wasn't an option. A compiler had to start reading the file at the top, and then proceed linearly through the source code. The header mechanism enables this. The compiler doesn't have to consider other translation units, it just has to read the code from top to bottom.
And C++ retained this system for backwards compatibility.
Today, it makes no sense. It is inefficient, error-prone and overcomplicated. There are far better ways to separate interface and implementation, if that was the goal.
However, one of the proposals for C++0x was to add a proper module system, allowing code to be compiled similar to .NET or Java, into larger modules, all in one go and without headers. This proposal didn't make the cut in C++0x, but I believe it's still in the "we'd love to do this later" category. Perhaps in a TR2 or similar.
You seem to be asking about separating definitions from declarations, although there are other uses for header files.
The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.
The reasons you might want to separate are:
To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".
If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p
More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.
So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.
Converting my comments to answer your follow-up question:
How does the compiler find the .cpp file with the code in it
It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.
C++ does it that way because C did it that way, so the real question is why did C do it that way? Wikipedia speaks a little to this.
Newer compiled languages (such as
Java, C#) do not use forward
declarations; identifiers are
recognized automatically from source
files and read directly from dynamic
library symbols. This means header
files are not needed.
To my (limited - I'm not a C developer normally) understanding, this is rooted in C. Remember that C does not know what classes or namespaces are, it's just one long program. Also, functions have to be declared before you use them.
For example, the following should give a compiler error:
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
The error should be that "SomeOtherFunction is not declared" because you call it before it's declaration. One way of fixing this is by moving SomeOtherFunction above SomeFunction. Another approach is to declare the functions signature first:
void SomeOtherFunction();
void SomeFunction() {
SomeOtherFunction();
}
void SomeOtherFunction() {
printf("What?");
}
This lets the compiler know: Look somewhere in the code, there is a function called SomeOtherFunction that returns void and does not take any parameters. So if you encouter code that tries to call SomeOtherFunction, do not panic and instead go looking for it.
Now, imagine you have SomeFunction and SomeOtherFunction in two different .c files. You then have to #include "SomeOther.c" in Some.c. Now, add some "private" functions to SomeOther.c. As C does not know private functions, that function would be available in Some.c as well.
This is where .h Files come in: They specify all the functions (and variables) that you want to 'Export' from a .c file that can be accessed in other .c files. That way, you gain something like a Public/Private scope. Also, you can give this .h file to other people without having to share your source code - .h files work against compiled .lib files as well.
So the main reason is really for convenience, for source code protection and to have a bit of decoupling between the parts of your application.
That was C though. C++ introduced Classes and private/public modifiers, so while you could still ask if they are needed, C++ AFAIK still requires declaration of functions before using them. Also, many C++ Developers are or were C devleopers as well and took over their concepts and habits to C++ - why change what isn't broken?
First advantage: If you don't have header files, you would have to include source files in other source files. This would cause the including files to be compiled again when the included file changes.
Second advantage: It allows sharing the interfaces without sharing the code between different units (different developers, teams, companies etc..)
The need for header files results from the limitations that the compiler has for knowing about the type information for functions and or variables in other modules. The compiled program or library does not include the type information required by the compiler to bind to any objects defined in other compilation units.
In order to compensate for this limitation, C and C++ allow for declarations and these declarations can be included into modules that use them with the help of the preprocessor's #include directive.
Languages like Java or C# on the other hand include the information necessary for binding in the compiler's output (class-file or assembly). Hence, there is no longer a need for maintaining standalone declarations to be included by clients of a module.
The reason for the binding information not being included in the compiler output is simple: it is not needed at runtime (any type checking occurs at compile time). It would just waste space. Remember that C/C++ come from a time where the size of an executable or library did matter quite a bit.
Well, C++ was ratified in 1998, but it had been in use for a lot longer than that, and the ratification was primarily setting down current usage rather than imposing structure. And since C++ was based on C, and C has header files, C++ has them too.
The main reason for header files is to enable separate compilation of files, and minimize dependencies.
Say I have foo.cpp, and I want to use code from the bar.h/bar.cpp files.
I can #include "bar.h" in foo.cpp, and then program and compile foo.cpp even if bar.cpp doesn't exist. The header file acts as a promise to the compiler that the classes/functions in bar.h will exist at run-time, and it has everything it needs to know already.
Of course, if the functions in bar.h don't have bodies when I try to link my program, then it won't link and I'll get an error.
A side-effect is that you can give users a header file without revealing your source code.
Another is that if you change the implementation of your code in the *.cpp file, but do not change the header at all, you only need to compile the *.cpp file instead of everything that uses it. Of course, if you put a lot of implementation into the header file, then this becomes less useful.
C++ was designed to add modern programming language features to the C infrastructure, without unnecessarily changing anything about C that wasn't specifically about the language itself.
Yes, at this point (10 years after the first C++ standard and 20 years after it began seriously growing in usage) it is easy to ask why doesn't it have a proper module system. Obviously any new language being designed today would not work like C++. But that isn't the point of C++.
The point of C++ is to be evolutionary, a smooth continuation of existing practise, only adding new capabilities without (too often) breaking things that work adequately for its user community.
This means that it makes some things harder (especially for people starting a new project), and some things easier (especially for those maintaining existing code) than other languages would do.
So rather than expecting C++ to turn into C# (which would be pointless as we already have C#), why not just pick the right tool for the job? Myself, I endeavour to write significant chunks of new functionality in a modern language (I happen to use C#), and I have a large amount of existing C++ that I am keeping in C++ because there would be no real value in re-writing it all. They integrate very nicely anyway, so it's largely painless.
It doesn't need a separate header file with the same functions as in main. It only needs it if you develop an application using multiple code files and if you use a function that was not previously declared.
It's really a scope problem.
C++ was ratified in 1998, so why is it designed this way? What advantages does having a separate header file have?
Actually header files become very useful when examining programs for the first time, checking out header files(using only a text editor) gives you an overview of the architecture of the program, unlike other languages where you have to use sophisticated tools to view classes and their member functions.
If you want the compiler to find out symbols defined in other files automatically, you need to force programmer to put those files in predefined locations (like Java packages structure determines folders structure of the project). I prefer header files. Also you would need either sources of libraries you use or some uniform way to put information needed by compiler in binaries.
I think the real (historical) reason behind header files was making like easier for compiler developers... but then, header files do give advantages.
Check this previous post for more discussions...
Well, you can perfectly develop C++ without header files. In fact some libraries that intensively use templates does not use the header/code files paradigm (see boost). But In C/C++ you can not use something that is not declared. One practical way to
deal with that is to use header files. Plus, you gain the advantage of sharing interface whithout sharing code/implementation. And I think it was not envisionned by the C creators : When you use shared header files you have to use the famous :
#ifndef MY_HEADER_SWEET_GUARDIAN
#define MY_HEADER_SWEET_GUARDIAN
// [...]
// my header
// [...]
#endif // MY_HEADER_SWEET_GUARDIAN
that is not really a language feature but a practical way to deal with multiple inclusion.
So, I think that when C was created, the problems with forward declaration was underestimated and now when using a high level language like C++ we have to deal with this sort of things.
Another burden for us poor C++ users ...
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm currently transitioning to working in C, primarily focused on developing large libraries. I'm coming from a decent amount of application based programming in C++, although I can't claim expertise in either language.
What I'm curious about is when and why many popular open source libraries choose to not separate their code in a 1-1 relationship with a .h file and corresponding .c files -- even in instances where the .c isn't generating an executable.
In the past I'd been lead to believe that structuring code in this manner is optimal not only in organization, but also for linking purposes -- and I don't see how the lack of OOD features of C would effect this (not to mention not separating the implementation and interface also occurs in C++ libraries).
There is no inherent technical reason in C to provide .c and .h files in matched pairs. There is certainly no reason related to linking, as in conventional C usage, neither .c nor .h files has anything directly to do with that.
It is entirely possible and potentially advantageous to collect declarations related to multiple .c files in a smaller number of .h files. Providing only one or a small number of header files makes it easier to use the associated library: you don't need to remember or look up which header you need for the declaration of each function, type, or variable.
There are at least three consequence that arise from doing that, however:
you make it harder to determine where to find the implementations of functions declared in collective headers.
you make your project more susceptible to mass rebuilding cascades, as most object files depend on one or more of a small number of headers, and changes or additions to your function signatures all affect one of that small number of headers.
the compiler has to spend more effort digesting one large header with all the library's declarations than to digest one or a small number of headers focused narrowly on the specific declarations used in a given .c file, as #ChrisBeck observed. This tends to be much less of a problem for C code than it does for C++ code, however.
You need a separate .h file only when something is included in more than one compilation unit.
A form of "keep things local unless you have to share" wisdom.
In the past I'd been lead to believe that structuring code in this manner is optimal not only in organization, but also for linking purposes -- and I don't see how the lack of OOD features of C would effect this (not to mention not separating the implementation and interface also occurs in C++ libraries).
In traditional C code, you will always put declarations in the .h files, and definitions in the .c files. This is indeed to optimize compilation -- the reason is that, it makes each compilation unit take the minimum amount of memory, since it only has the definitions that it needs to output code for, and if you manage includes properly, it only has the declarations it needs also. It also makes it simple to see that you aren't breaking the one definition rule.
In modern machines its less important to do this from the perspective of, not having awful build times -- machines now have a lot of memory.
In C++ you have template files which are generally only in the header.
You also in recent years have people experimenting with so-called "Unity Builds" where you have one compilation unit which includes all of the other source files and you build it all at once. See here: The benefits / disadvantages of unity builds?
So today, having 1-1 correspondence is mainly a style / organizational thing.
A really, really basic, but entirely realistic scenario where a 1-1 relation between .h and .c files is not required, and even not desirable:
main.h
//A lib's/extension/applications' main header file
//for user API -> obfuscated types
typedef struct _internal_str my_type;
//API functions
my_type * init_resource( void );//some arguments will probably be required
//get helper resource -> not part of the API, but the lib uses it internally in all translation units
const struct helper_str *get_help( void );
Now this get_help function is, as the comment says, not part of the libs' API. All the .c files that make up the lib are using it, though, and the get_help function is defined in the helper.c translation unit. This file might look something like this:
#include "main.h"
#include <third/party.h>
//static functions
static
third_party_type *init_external_resource( void )
{
//implement this
}
static
void cleanup_stuff(third_party_type *p)
{
third_party_free(p);
}
const struct helper_str *get_help( void )
{
//implementation of external function
}
Ok, so it's a convenience thing: not adding another .h file, because there's only 1 external function you're calling. But that's no good reason not to use a separate header file, right? Agreed. It's not a good reason.
However: Imagine that your code depends on this third party library a lot, and each component of whatever you're building uses a different part of this library. The "help" you need/want from this helper.c file might differ. That's when you could decide to create several header files, to control the way the helper.c file is being used internally in your project. For example: you've got some logging-stuff in translation units X and Y, these files might include a file like this:
//specific_help.h
char * err_to_log_msg(int error_nr);//relevant arguments, of course...
Whereas a file that doesn't come near output, but, for example, manages thread-safety or signals, might want to call a function in helper.c that frees some resources in case some event was detected (signals, keystrokes, mouse events... whatever). This file might include a header file like:
//system_help.h
void free_helper_resources(int level);
All of these headers link back to functions defined in helper.c, but you could end up with 10 header files for a single c file.
Once you have these various headers exposing a selection of functions, you might end up adding specific typedefs to each of these headers, depending on how the two components interact... ah well, it's a matter of taste anyway.
Many people will just opt for a single header file to go with the helper.c file, and include that. They'll probably not use half of the functions they have access to, but they'll have less files to worry about.
On the other hand, if others start tinkering with their code, they might be tempted to add functions in a certain file that don't belong: they might add logging functions to the signal/event handling files and vice-versa
In the end: use your common sense, don't expose more than you need to. It's easy to remove a static keyword and just add the prototype to a header file if you really need to.
For whatever reason, our company has a coding guideline that states:
Each class shall have it's own header and implementation file.
So if we wrote a class called MyString we would need an associated MyStringh.h and MyString.cxx.
Does anyone else do this? Has anyone seen any compiling performance repercussions as a result? Does 5000 classes in 10000 files compile just as quickly as 5000 classes in 2500 files? If not, is the difference noticeable?
[We code C++ and use GCC 3.4.4 as our everyday compiler]
The term here is translation unit and you really want to (if possible) have one class per translation unit ie, one class implementation per .cpp file, with a corresponding .h file of the same name.
It's usually more efficient (from a compile/link) standpoint to do things this way, especially if you're doing things like incremental link and so forth. The idea being, translation units are isolated such that, when one translation unit changes, you don't have to rebuild a lot of stuff, as you would have to if you started lumping many abstractions into a single translation unit.
Also you'll find many errors/diagnostics are reported via file name ("Error in Myclass.cpp, line 22") and it helps if there's a one-to-one correspondence between files and classes. (Or I suppose you could call it a 2 to 1 correspondence).
Overwhelmed by thousands lines of code?
Having one set of header/source files per class in a directory can seem overkill. And if the number of classes goes toward 100 or 1000, it can even be frightening.
But having played with sources following the philosophy "let's put together everything", the conclusion is that only the one who wrote the file has any hope to not be lost inside. Even with an IDE, it is easy to miss things because when you're playing with a source of 20,000 lines, you just close your mind for anything not exactly referring to your problem.
Real life example: the class hierarchy defined in those thousand lines sources closed itself into a diamond-inheritance, and some methods were overridden in child classes by methods with exactly the same code. This was easily overlooked (who wants to explore/check a 20,000 lines source code?), and when the original method was changed (bug correction), the effect was not as universal as excepted.
Dependancies becoming circular?
I had this problem with templated code, but I saw similar problems with regular C++ and C code.
Breaking down your sources into 1 header per struct/class lets you:
Speed up compilation because you can use symbol forward-declaration instead of including whole objects
Have circular dependencies between classes (§) (i.e. class A has a pointer to B, and B has a pointer to A)
In source-controlled code, class dependencies could lead to regular moving of classes up and down the file, just to make the header compile. You don't want to study the evolution of such moves when comparing the same file in different versions.
Having separate headers makes the code more modular, faster to compile, and makes it easier to study its evolution through different versions diffs
For my template program, I had to divide my headers into two files: The .HPP file containing the template class declaration/definition, and the .INL file containing the definitions of the said class methods.
Putting all this code inside one and only one unique header would mean putting class definitions at the beginning of this file, and the method definitions at the end.
And then, if someone needed only a small part of the code, with the one-header-only solution, they still would have to pay for the slower compilation.
(§) Note that you can have circular dependencies between classes if you know which class owns which. This is a discussion about classes having knowledge of the existence of other classes, not shared_ptr circular dependencies antipattern.
One last word: Headers should be self-sufficients
One thing, though, that must be respected by a solution of multiple headers and multiple sources.
When you include one header, no matter which header, your source must compile cleanly.
Each header should be self-sufficient. You're supposed to develop code, not treasure-hunting by greping your 10,000+ source files project to find which header defines the symbol in the 1,000 lines header you need to include just because of one enum.
This means that either each header defines or forward-declare all the symbols it uses, or include all the needed headers (and only the needed headers).
Question about circular dependencies
underscore-d asks:
Can you explain how using separate headers makes any difference to circular dependencies? I don't think it does. We can trivially create a circular dependency even if both classes are fully declared in the same header, simply by forward-declaring one in advance before we declare a handle to it in the other. Everything else seems to be great points, but the idea that separate headers facilitate circular dependencies seems way off
underscore_d, Nov 13 at 23:20
Let's say you have 2 class templates, A and B.
Let's say the definition of class A (resp. B) has a pointer to B (resp. A). Let's also say the methods of class A (resp. B) actually call methods from B (resp. A).
You have a circular dependency both in the definition of the classes, and the implementations of their methods.
If A and B were normal classes, and A and B's methods were in .CPP files, there would be no problem: You would use a forward declaration, have a header for each class definitions, then each CPP would include both HPP.
But as you have templates, you actually have to reproduce that patterns above, but with headers only.
This means:
a definition header A.def.hpp and B.def.hpp
an implementation header A.inl.hpp and B.inl.hpp
for convenience, a "naive" header A.hpp and B.hpp
Each header will have the following traits:
In A.def.hpp (resp. B.def.hpp), you have a forward declaration of class B (resp. A), which will enable you to declare a pointer/reference to that class
A.inl.hpp (resp. B.inl.hpp) will include both A.def.hpp and B.def.hpp, which will enable methods from A (resp. B) to use the class B (resp. A).
A.hpp (resp. B.hpp) will directly include both A.def.hpp and A.inl.hpp (resp. B.def.hpp and B.inl.hpp)
Of course, all headers need to be self sufficient, and protected by header guards
The naive user will include A.hpp and/or B.hpp, thus ignoring the whole mess.
And having that organization means the library writer can solve the circular dependencies between A and B while keeping both classes in separate files, easy to navigate once you understand the scheme.
Please note that it was an edge case (two templates knowing each other). I expect most code to not need that trick.
We do that at work, its just easier to find stuff if the class and files have the same name. As for performance, you really shouldn't have 5000 classes in a single project. If you do, some refactoring might be in order.
That said, there are instances when we have multiple classes in one file. And that is when it's just a private helper class for the main class of the file.
+1 for separation. I just came onto a project where some classes are in files with a different name, or lumped in with another class, and it is impossible to find these in a quick and efficient manner. You can throw more resources at a build - you can't make up lost programmer time because (s)he can't find the right file to edit.
In addition to simply being "clearer", separating classes into separate files makes it easier for multiple developers not to step on each others toes. There will be less merging when it comes time to commit changes to your version control tool.
Most places where I have worked have followed this practice. I've actually written coding standards for BAE (Aust.) along with the reasons why instead of just carving something in stone with no real justification.
Concerning your question about source files, it's not so much time to compile but more an issue of being able to find the relevant code snippet in the first place. Not everyone is using an IDE. And knowing that you just look for MyClass.h and MyClass.cpp really saves time compared to running "grep MyClass *.(h|cpp)" over a bunch of files and then filtering out the #include MyClass.h statements...
Mind you there are work-arounds for the impact of large numbers of source files on compile times. See Large Scale C++ Software Design by John Lakos for an interesting discussion.
You might also like to read Code Complete by Steve McConnell for an excellent chapter on coding guidelines. Actualy, this book is a great read that I keep coming back to regularly.
N.B. You need the first edition of Code Complete that is easily available online for a copy. The interesting section on coding and naming guidelines didn't make it into Code Complete 2.
It's common practice to do this, especially to be able to include .h in the files that need it. Of course the performance is affected but try not to think about this problem until it arises :).
It's better to start with the files separated and after that try to merge the .h's that are commonly used together to improve performance if you really need to. It all comes down to dependencies between files and this is very specific to each project.
The best practice, as others have said, is to place each class in its own translation unit from a code maintenance and understandability perspective. However on large scale systems this is sometimes not advisable - see the section entitled "Make Those Source Files Bigger" in this article by Bruce Dawson for a discussion of the tradeoffs.
The same rule applies here, but it notes a few exceptions where it is allowed Like so:
Inheritance trees
Classes that are only used within a very limited scope
Some Utilities are simply placed in a general 'utils.h'
It is very helpful to have only have one class per file, but if you do your building via bulkbuild files which include all the individual C++ files, it makes for faster compilations since startup time is relatively large for many compilers.
I found these guidelines particularly useful when it comes to header files :
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Header_Files
Two words: Ockham's Razor. Keep one class per file with the corresponding header in a separate file. If you do otherwise, like keeping a piece of functionality per file, then you have to create all kinds of rules about what constitutes a piece of functionality. There is much more to gain by keeping one class per file. And, even half decent tools can handle large quantities of files. Keep it simple my friend.
I'm surprised that almost everyone is in favor of having one file per class. The problem with that is that in the age of 'refactoring' one may have a hard time keeping the file and class names in synch. Everytime you change a class name, you then have to change the file name too, which means that you have to also make a change everywhere the file is included.
I personally group related classes into a single files and then give such a file a meaningful name that won't have to change even if a class name changes. Having fewer files also makes scrolling through a file tree easier.
I use Visual Studio on Windows and Eclipse CDT on Linux, and both have shortcut keys that take you straight to a class declaration, so finding a class declaration is easy and quick.
Having said that, I think once a project is completed, or its structure has 'solidified', and name changes become rare, it may make sense to have one class per file. I wish there was a tool that could extract classes and place them in distinct .h and .cpp files. But I don't see this as essential.
The choice also depends on the type of project one works on. In my opinion the issue doesn't deserve a black and white answer since either choice has pros and cons.