Why to use header files with declarations in C++?

Why to use header files with declarations in C++? - c++

This answer says we need header files for declarations of other functions placed somewhere else, for example in .lib or .dll or .cpp files.
My question is, why do we really need to use declarations in header files at all? Why can't we just use definitions straight away? For example, place a definition instead of a declaration and address to it directly. To a beginner, all this declaration stuff seems to be sort of a redundant work.

A few reasons come to mind:
C++ being an old language, it is sensitive to the ordering of code. Things must be declared before they can be used, and header files are a convenient and consistent way to do this. For example, this example would work in most modern languages, but not in C++:
int main() {
foo(); // error: 'foo' has not been declared
return 0;
}
void foo() { /* ... */ }
It separates the interface from the implementation. This means you can push out updates to a library's implementation, and as long as none of the interfaces have changed, clients do not need to re-compile their code. This also means that you can have multiple implementations corresponding to a single interface - for example, maybe you want to write a filesystem library, but the implementation of that library will have to look significantly different on Linux than it does on Windows. Both of these implementations can share the same header, making the implementation details "opaque" to users.

Bear in mind that all definitions are declarations in C++ (but not all declarations are definitions).
Non-inline functions are usually declared but not defined in a header to avoid breaking the one-definition rule (and linking problems) when that header is included in multiple sources.
Declarations (not definitions) of some class/struct types are needed in headers to break circular dependencies (which causes diagnosable errors and prevents compilation).
These types of concerns rarely arise in "toy" problems used in programming courses and ivory towers, but are critically important in real-world development. For example, in toy problems, a header file might only be included in a single source file - so the concerns of breaking the one-definition rule don't arise. But real-world projects typically include headers in multiple source files - so are affected by problems that arise when definitions are inappropriately placed into header files.
In short, the approaches used in small learning exercises (which cause students to wonder why there is any need to put declarations into headers rather than definitions) don't scale to real-world development.
Some types of definitions (templates, inline functions, and class/struct definitions) are routinely placed into headers. That doesn't mean that all definitions should be placed in headers.

Related

Motivating real world examples of the 'inline' specifier?

Background: The C++ inline keyword does not determine if a function should be inlined.
Instead, inline permits you to provide multiple definitions of a single function or variable, so long as each definition occurs in a different translation unit.
Basically, this allows definitions of global variables and functions in header files.
Are there some examples of why I might want to write a definition in a header file?
I've heard that there might be templating examples where it's impossible to write the definition in a separate cpp file.
I've heard other claims about performance. But is that really true? Since, to my knowledge, the use of the inline keyword doesn't guarantee that the function call is inlined (and vice versa).
I have a sense that this feature is probably primarily used by library writers trying to write wacky and highly optimized implementations. But are there some examples?

It's actually simple: you need inline when you want to write a definition (of a function or variable (since c++17)) in a header. Otherwise you would violate odr as soon as your header is included in more than 1 tu. That's it. That's all there is to it.
Of note is that some entities are implicitly declared inline like:
methods defined inside the body of the class
template functions and variables
constexpr functions and variables
Now the question becomes why and when would someone want to write definitions in the header instead of separating declarations in headers and definitions in source code files. There are advantages and disadvantages to this approach. Here are some to consider:
optimization
Having the definition in a source file means that the code of the function is baked into the tu binary. It cannot be inlined at the calling site outside of the tu that defines it. Having it in a header means that the compiler can inline it everywhere it sees fit. Or it can generate different code for the function depending on the context where it is called. The same can be achieved with lto within an executable or library, but for libraries the only option for enabling this optimization is having the definitions in the header.
library distribution
Besides enabling more optimizations in a library, having a header only library (when it's possible) means an easier way to distribute that library. All the user has to do is download the headers folder and add it to the include path of his/her project. In the case of non header only library things become more complicated. Because you can't mix and match binaries compiled by different compiler and even by the same compiler but with different flags. So you either have to distribute your library with the full source code along with a build tool or have the library compiled in many formats (cpu architecture/OS/compiler/compiler flags combinations)
human preference
Having to write the code once is considered by some (me included) an advantage: both from code documentation perspective and from a maintenance perspective. Others consider separating declaration from definitions is better. One argument is that it achieves separation of interface vs implementation but that is just not the case: in a header you need to have private member declarations even if those aren't part of the interface.
compile time performance
Having all the code in header means duplicating it in every tu. This is a real problem when it comes to compilation time. Heavy header C++ projects are notorious for slow compilation times. It also means that a modification of a function definition would trigger the recompilation of all the tu that include it, as opposed to just 1 tu in the case of definition in source code. Precompiled headers try to solve this problem but the solutions are not portable and have problems of their own.

If the same function definition appears in multiple compilation units then it needs to be inline otherwise you get a linking error.
You need the inline keyword e.g. for function templates if you want to make them available using a header because then their definition also has to be in the header.
The below statement might be a bit oversimplified because compilers and linkers are really complex nowadays, but to get a basic idea it is still valid.
A cpp file and the headers included by that cpp file form a compilation unit and each compilation unit is compiled individually. Within that compilation unit, the compiler can do many optimizations like potentially inlining any function call (no matter if it is a member or a free function) as long as the code still behaves according to the specification.
So if you place the function definition in the header you allow the compiler to know the code of that function and potentially do more optimizations.
If the definition is in another compilation unit the compiler can't do much and optimizations then can only be done at linking time. Link time optimizations are also possible and are indeed also done. And while link-time optimizations became better they potentially can't do as much as the compiler can do.
Header only libraries have the big advantage that you do not need to provide project files with them, the one how wants to use that library just copies the headers to their projects and includes them.

In short:
You're writing a library and you want it to be header-only, to make its use more convenient.
Even if it's not a library, in some cases you may want to keep some of the definitions in a header to make it easier to maintain (whether or not this makes things easier is subjective).
to my knowledge, the use of the inline keyword doesn't guarantee that the function call is inlined
Yes, defining it in a header (as inline) doesn't guarantee inlining. But if you don't define it in a header, it will never be inlined (unless you're using link-time optimizations). So:
You want the compiler to be able to inline the functions, if it decides to.
Also it may the compiler more knowledge about a function:
maybe it never throws, but is not marked noexcept;
maybe several consecutive calls can be merged into one (there's no side effects, etc), but __attribute__((const)) is missing;
maybe it never returns, but [[noreturn]] is missing;
...
there might be templating examples where it's impossible to write the definition in a separate cpp file.
That's true for most templates. They automatically behave as if they were inline, so you don't need to specify it explicitly. See Why can templates only be implemented in the header file? for details.

What exactly is a C++ header file and declaration?

I know the basics about header files and forward declarations but I'm wondering, if I declare the exact same thing in two separate header files and then compile that, will that work?
Is the C++ interface, in this case, portable, I mean, If I have two libs and they share the same interface (declaration or what not) somewhere could I theoretically just replicate the same declaration in the program and actually compile that, or if not, why?
How would C++ for instance be able to tell the difference between two identical declarations in two different files?
Say I had two distinct libs but they shared some interface, they are compiled separately but by the same tools, would it be possible to in a future step bring these to libs together and actually pass the interface between these two libs as if it was the same, compatible interface, even though it was originally compiled from different (but identical) header files?

Function declarations, variable declarations and class definitions for a given identifier may appear in the source code as often as you like. They just have to be identical each time they appear (which is automatic if you include a given header file in multiple translation units).
It is only the definition of functions, variables and class member functions that must appear precisely once (with a special rule for inlined functions).
(It's a little different for templates: Template definitions may appear repeatedly, but again all occurrences have to be identical. But templates require some non-trivial deduplication work from the linker.)

Is it possible to write header file without include guard, and without multiple definition errors?

Just out of curiosity I wanted to know if is there a way to achieve this.
In C++ we learn that we should avoid using macros. But when we use include guards, we do use at least one macro. So I was wondering if there is a way to write a macro-free program.

It's definitely possible, though it's unimaginably bad practice not to have include guards. It's important to understand what the #include statement actually does: the contents of another file are pasted directly into your source file before it's compiled. An include guard prevents the same code from being pasted again.
Including a file only causes an error if it would be incorrect to type the contents of that file at the position you included it. As an example, you can declare (note: declare, not define) the same function (or class) multiple times in a single compilation unit. If your header file consists only of declarations, you don't need to specify an include guard.
IncludedFile.h
class SomeClassSomewhere;
void SomeExternalFunction(int x, char y);
Main.cpp
#include "IncludedFile.h"
#include "IncludedFile.h"
#include "IncludedFile.h"
int main(int argc, char **argv)
{
return 0;
}
While declaring a function (or class) multiple times is fine, it isn't okay to define the same function (or class) more than once. If there are two or more definitions for a function, the linker doesn't know which one to choose and gives up with a "multiply defined symbols" error.
In C++, it's very common for header files to include class definitions. An include guard prevents the #included file from being pasted into your source file a second time, which means your definitions will only appear once in the compiled code, and the linker won't be confused.
Rather than trying to figure out when you need to use them and when you don't, just always use include guards. Avoiding macros most of the time is a good idea; this is one situation where they aren't evil, and using them here isn't dangerous.

It is definitely doable and I have used some early C++ libraries which followed an already misguided approach from C which essentially required the user of a header to include certain other headers before this. This is based on thoroughly understanding what creates a dependency on what else and to use declarations rather than definitions wherever possible:
Declarations can be repeated multiple times although they are obviously required to be consistent and some entities can't be declared (e.g. enum can only be defined; in C++ 2011 it is possible to also declare enums).
Definitions can't be repeated but are only needed when the definition if really used. For example, using a pointer or a reference to a class doesn't need its definition but only its declaration.
The approach to writing headers would, thus, essentially consist of trying to avoid definitions as much as possible and only use declaration as far as possible: these can be repeated in a header file or corresponding headers can even be included multiple times. The primary need for definitions comes in when you need to derive from a base class: this can't be avoided and essentially means that the user would have to include the header for the base class before using any of the derived classes. The same is true for members defined directly in the class but using the pimpl-idiom the need for member definitions can be pushed to the implementation file.
Although there are a few advantages to this approach it also has a few severe drawbacks. The primary advantage is that it kind of enforces a very thorough separation and dependency management. On the other hand, overly aggressive separation e.g. using the pimpl-idiom for everything also has a negative performance impact. The biggest drawback is that a lot the implementation details are implicitly visible to the user of a header because the respective headers this one depends on need to be included first explicitly. At least, the compiler enforces that you get the order of include files right.
From a usability and dependency point of view I think there is a general consensus that headers are best self-contained and that the use of include guards is the lesser evil.

It is possible to do so if you ensure the same header file is not being included in the same translation unit multiple times.
Also, you could use:
#pragma once
if portability is not your concern.
However, you should avoid using #pragma once over Include Guards because:
It is not standard & hence non portable.
It is less intuitive and not all users might know of it.
It provides no big advantage over the classic and very well known Include Guards.

In short, yes, even without pragmas. Only if you can guarantee that every header file is included only once. However, given how code tends to grow, it becomes increasingly difficult to honour that guarantee as the number of header files increase. This is why not using header guards is considered bad practice.
Pre-processor macros are frowned upon, yes. However, header include guards are a necessary evil because the alternative is so much worse (#pragma once will only work if your compiler supports it, so you lose portability)
With regard to pre-processor macros, use this rule:
If you can come up with an elegant solution that does not involve a macro, then avoid them.

Does the non-portable, non-standard
#pragma once
work sufficiently well for you? Personally, I'd rather use macros for preventing reinclusion, but that's your decision.

What should go into an .h file?

When dividing your code up into multiple files just what exactly should go into an .h file and what should go into a .cpp file?

Header files (.h) are designed to provide the information that will be needed in multiple files. Things like class declarations, function prototypes, and enumerations typically go in header files. In a word, "definitions".
Code files (.cpp) are designed to provide the implementation information that only needs to be known in one file. In general, function bodies, and internal variables that should/will never be accessed by other modules, are what belong in .cpp files. In a word, "implementations".
The simplest question to ask yourself to determine what belongs where is "if I change this, will I have to change code in other files to make things compile again?" If the answer is "yes" it probably belongs in the header file; if the answer is "no" it probably belongs in the code file.

Fact is, in C++, this is somewhat more complicated that the C header/source organization.
What does the compiler see?
The compiler sees one big source (.cpp) file with its headers properly included. The source file is the compilation unit that will be compiled into an object file.
So, why are headers necessary?
Because one compilation unit could need information about an implementation in another compilation unit. So one can write for example the implementation of a function in one source, and write the declaration of this function in another source needing to use it.
In this case, there are two copies of the same information. Which is evil...
The solution is to share some details. While the implementation should remain in the Source, the declaration of shared symbols, like functions, or definition of structures, classes, enums, etc., could need to be shared.
Headers are used to put those shared details.
Move to the header the declarations of what need to be shared between multiple sources
Nothing more?
In C++, there are some other things that could be put in the header because, they need, too, be shared:
inline code
templates
constants (usually those you want to use inside switches...)
Move to the header EVERYTHING what need to be shared, including shared implementations
Does it then mean that there could be sources inside the headers?
Yes. In fact, there are a lot of different things that could be inside a "header" (i.e. shared between sources).
Forward declarations
declarations/definition of functions/structs/classes/templates
implementation of inline and templated code
It becomes complicated, and in some cases (circular dependencies between symbols), impossible to keep it in one header.
Headers can be broken down into three parts
This means that, in an extreme case, you could have:
a forward declaration header
a declaration/definition header
an implementation header
an implementation source
Let's imagine we have a templated MyObject. We could have:
// - - - - MyObject_forward.hpp - - - -
// This header is included by the code which need to know MyObject
// does exist, but nothing more.
template<typename T>
class MyObject ;
.
// - - - - MyObject_declaration.hpp - - - -
// This header is included by the code which need to know how
// MyObject is defined, but nothing more.
#include <MyObject_forward.hpp>
template<typename T>
class MyObject
{
public :
MyObject() ;
// Etc.
} ;
void doSomething() ;
.
// - - - - MyObject_implementation.hpp - - - -
// This header is included by the code which need to see
// the implementation of the methods/functions of MyObject,
// but nothing more.
#include <MyObject_declaration.hpp>
template<typename T>
MyObject<T>::MyObject()
{
doSomething() ;
}
// etc.
.
// - - - - MyObject_source.cpp - - - -
// This source will have implementation that does not need to
// be shared, which, for templated code, usually means nothing...
#include <MyObject_implementation.hpp>
void doSomething()
{
// etc.
} ;
// etc.
Wow!
In the "real life", it is usually less complicated. Most code will have only a simple header/source organisation, with some inlined code in the source.
But in other cases (templated objects knowing each others), I had to have for each object separate declaration and implementation headers, with an empty source including those headers just to help me see some compilation errors.
Another reason to break down headers into separate headers could be to speed up the compilation, limiting the quantity of symbols parsed to the strict necessary, and avoiding unecessary recompilation of a source who cares only for the forward declaration when an inline method implementation changed.
Conclusion
You should make your code organization both as simple as possible, and as modular as possible. Put as much as possible in the source file. Only expose in headers what needs to be shared.
But the day you'll have circular dependancies between templated objects, don't be surprised if your code organization becomes somewhat more "interesting" that the plain header/source organization...
^_^

in addition to all other answers, i will tell you what you DON'T place in a header file:
using declaration (the most common being using namespace std;) should not appear in a header file because they pollute the namespace of the source file in which it is included.

What compiles into nothing (zero binary footprint) goes into header file.
Variables do not compile into nothing, but type declarations do (coz they only describe how variables behave).
functions do not, but inline functions do (or macros), because they produce code only where called.
templates are not code, they are only a recipe for creating code. so they also go in h files.

In general, you put declarations in the header file and definitions in the implementation (.cpp) file. The exception to this is templates, where the definition must also go in the header.
This question and ones similar to it has been asked frequently on SO - see Why have header files and .cpp files in C++? and C++ Header Files, Code Separation for example.

Mainly header file contain class skeleton or declaration (does not change frequently)
and cpp file contains class implementation (changes frequently).

Header (.h)
Macros and includes needed for the interfaces (as few as possible)
The declaration of the functions and classes
Documentation of the interface
Declaration of inline functions/methods, if any
extern to global variables (if any)
Body (.cpp)
Rest of macros and includes
Include the header of the module
Definition of functions and methods
Global variables (if any)
As a rule of thumb, you put the "shared" part of the module on the .h (the part that other modules needs to be able to see) and the "not shared" part on the .cpp
PD: Yes, I've included global variables. I've used them some times and it's important not to define them on the headers, or you'll get a lot of modules, each defining its own variable.

Your class and function declarations plus the documentation, and the definitions for inline functions/methods (although some prefer to put them in separate .inl files).

the header file (.h) should be for declarations of classes, structs and its methods, prototypes, etc. The implementation of those objects are made in cpp.
in .h
class Foo {
int j;
Foo();
Foo(int)
void DoSomething();
}

I'd expect to see:
declarations
comments
definitions marked inline
templates
the really answer though is what not to put in:
definitons (can lead to things being multiply defined)
using declarations/directives (forces them on anyone including your header, can cause nameclashes)

The header Defines something but doesn't tell anything about the implementation. ( Excluding Templates in this "metafore".
With that said, you need to divide "definitions" into sub-groups, there are, in this case, two types of definitions.
You define the "layout" of your strucutre, telling only as much as is needed by the surrounding usage groups.
The definitions of a variable, function and a class.
Now, I am of course talking about the first subgroup.
The header is there to define the layout of your structure in order to help the rest of the software use the implementation. You might want to see it as an "abstraction" of your implementation, which is vaughly said but, I think it suits quite well in this case.
As previous posters have said and shown you declare private and public usage areas and their headers, this also includes private and public variables. Now, I don't want to go into design of the code here but, you might want to consider what you put in your headers, since that is the Layer between the end user and the implementation.

Header files - shouldn't change during development too often -> you should think, and write them at once (in ideal case)
Source files - changes during implementation

What are the advantages and disadvantages of separating declaration and definition as in C++?

In C++, declaration and definition of functions, variables and constants can be separated like so:
function someFunc();
function someFunc()
{
//Implementation.
}
In fact, in the definition of classes, this is often the case. A class is usually declared with it's members in a .h file, and these are then defined in a corresponding .C file.
What are the advantages & disadvantages of this approach?

Historically this was to help the compiler. You had to give it the list of names before it used them - whether this was the actual usage, or a forward declaration (C's default funcion prototype aside).
Modern compilers for modern languages show that this is no longer a necessity, so C & C++'s (as well as Objective-C, and probably others) syntax here is histotical baggage. In fact one this is one of the big problems with C++ that even the addition of a proper module system will not solve.
Disadvantages are: lots of heavily nested include files (I've traced include trees before, they are surprisingly huge) and redundancy between declaration and definition - all leading to longer coding times and longer compile times (ever compared the compile times between comparable C++ and C# projects? This is one of the reasons for the difference). Header files must be provided for users of any components you provide. Chances of ODR violations. Reliance on the pre-processor (many modern languages do not need a pre-processor step), which makes your code more fragile and harder for tools to parse.
Advantages: no much. You could argue that you get a list of function names grouped together in one place for documentation purposes - but most IDEs have some sort of code folding ability these days, and projects of any size should be using doc generators (such as doxygen) anyway. With a cleaner, pre-processor-less, module based syntax it is easier for tools to follow your code and provide this and more, so I think this "advantage" is just about moot.

It's an artefact of how C/C++ compilers work.
As a source file gets compiled, the preprocessor substitutes each #include-statement with the contents of the included file. Only afterwards does the compiler try to interpret the result of this concatenation.
The compiler then goes over that result from beginning to end, trying to validate each statement. If a line of code invokes a function that hasn't been defined previously, it'll give up.
There's a problem with that, though, when it comes to mutually recursive function calls:
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, foo won't compile as bar is unknown. If you switch the two functions around, bar won't compile as foo is unknown.
If you separate declaration and definition, though, you can order the functions as you wish:
void foo();
void bar();
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, when the compiler processes foo it already knows the signature of a function called bar, and is happy.
Of course compilers could work in a different way, but that's how they work in C, C++ and to some degree Objective-C.
Disadvantages:
None directly. If you're using C/C++ anyway, it's the best way to do things. If you've got a choice of language/compiler, then maybe you can pick one where this is not an issue. The only thing to consider with splitting declarations into header files is to avoid mutually recursive #include-statements - but that's what include guards are for.
Advantages:
Compilation speed: As all included files are concatenated and then parsed, reducing the amount and complexity of code in included files will improve compilation time.
Avoid code duplication/inlining: If you fully define a function in a header file, each object file that includes this header and references this function will contain it's own version of that function. As a side-note, if you want inlining, you need to put the full definition into the header file (on most compilers).
Encapsulation/clarity: A well defined class/set of functions plus some documentation should be enough for other developers to use your code. There is (ideally) no need for them to understand how the code works - so why require them to sift through it? (The counter-argument that it's may be useful for them to access the implementation when required still stands, of course).
And of course, if you're not interested in exposing a function at all, you can usually still choose to define it fully in the implementation file rather than the header.

The standard requires that when using a function, a declaration must be in scope. This means, that the compiler should be able to verify against a prototype (the declaration in a header file) what you are passing to it. Except of course, for functions that are variadic - such functions do not validate arguments.
Think of C, when this was not required. At that time, compilers treated no return type specification to be defaulted to int. Now, assume you had a function foo() which returned a pointer to void. However, since you did not have a declaration, the compiler will think that it has to return an integer. On some Motorola systems for example, integeres and pointers would be be returned in different registers. Now, the compiler will no longer use the correct register and instead return your pointer cast to an integer in the other register. The moment you try to work with this pointer -- all hell breaks loose.
Declaring functions within the header is fine. But remember if you declare and define in the header make sure they are inline. One way to achieve this is to put the definition inside the class definition. Otherwise prepend the inline keyword. You will run into ODR violation otherwise when the header is included in multiple implementation files.

There are two main advantages to separating declaration and definition into C++ header and source files. The first is that you avoid problems with the One Definition Rule when your class/functions/whatever are #included in more than one place. Secondly, by doing things this way, you separate interface and implementation. Users of your class or library need only to see your header file in order to write code that uses it. You can also take this one step farther with the Pimpl Idiom and make it so that user code doesn't have to recompile every time the library implementation changes.
You've already mentioned the disadvantage of code repetition between the .h and .cpp files. Maybe I've written C++ code for too long, but I don't think it's that bad. You have to change all user code every time you change a function signature anyway, so what's one more file? It's only annoying when you're first writing a class and you have to copy-and-paste from the header to the new source file.
The other disadvantage in practice is that in order to write (and debug!) good code that uses a third-party library, you usually have to see inside it. That means access to the source code even if you can't change it. If all you have is a header file and a compiled object file, it can be very difficult to decide if the bug is your fault or theirs. Also, looking at the source gives you insight into how to properly use and extend a library that the documentation might not cover. Not everyone ships an MSDN with their library. And great software engineers have a nasty habit of doing things with your code that you never dreamed possible. ;-)

Advantage
Classes can be referenced from other files by just including the declaration. Definitions can then be linked later on in the compilation process.

You basically have 2 views on the class/function/whatever:
The declaration, where you declare the name, the parameters and the members (in the case of a struct/class), and the definition where you define what the functions does.
Amongst the disadvantages are repetition, yet one big advantage is that you can declare your function as int foo(float f) and leave the details in the implementation(=definition), so anyone who wants to use your function foo just includes your header file and links to your library/objectfile, so library users as well as compilers just have to care for the defined interface, which helps understanding the interfaces and speeds up compile times.

One advantage that I haven't seen yet: API
Any library or 3rd party code that is NOT open source (i.e. proprietary) will not have their implementation along with the distribution. Most companies are just plain not comfortable with giving away source code. The easy solution, just distribute the class declarations and function signatures that allow use of the DLL.
Disclaimer: I'm not saying whether it's right, wrong, or justified, I'm just saying I've seen it a lot.

One big advantage of forward declarations is that when used carefully you can cut down the compile time dependencies between modules.
If ClassA.h needs to refer to a data element in ClassB.h, you can often use just a forward references in ClassA.h and include ClassB.h in ClassA.cc rather than in ClassA.h, thus cutting down a compile time dependency.
For big systems this can be a huge time saver on a build.

Disadvantage
This leads to a lot of repetition. Most of the function signature needs to be put in two or more (as Paulious noted) places.

Separation gives clean, uncluttered view of program elements.
Possibility to create and link to binary modules/libraries without disclosing sources.
Link binaries without recompiling sources.

When done correctly, this separation reduces compile times when only the implementation has changed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js