Confused on the locations of class declaration and implementation in c++ - c++

I have not been able to find the specific answer to this -- I do know that .h files contain class declarations and .cpp files hold the implementation.
I am confused as to how this is done without repetition.
If I have a class Animal it might be called in several areas of the program. So it would seem I would have to re-write the implementation several times (which no doubt means I am misunderstood).
It seems a bit goofy to include a .h declaration and .cpp implentation with every use of the given class (evidence I am misunderstanding something as well).
Where have I gone wrong in the subject of separating declaration and implementation?
I've been doing PHP and Python all of this time, so it might be prior habits confusing me.

.cpp files are compiled once into object (.o) files.
They are never included into other .cpp files (that would lead to multiple definitions, which are an error in the general case).
Declarations, which are generally retrieved by #includeing header files, are how one .cpp file can know what's defined in some other .cpp file and use it without actually having its definition.
Once all of the .cpp files have been compiled into object files, the linker comes along and links all the object files together to form the final executable or library.
If multiple definitions of the same thing are found, or if one is missing, that's a linker error.

Related

What's the REAL difference between .h and .cpp files?

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?
Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.
What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.
Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.
The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.
Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

What's the difference between including a header and a C++ file?

What's the difference between including a header(.h files) and a C++ file(.cpp files)? When I create a class, I create a .h file and .cpp file. If I want to use an object of this class should I include both of these files or not? In which cases should I include the .cpp file?
What files are called, and what their contents is, are entirely convention. If you like to confuse people, you could call your header files something.b and your source files something.r - this will of course mean nothing useful to most people, and some people may think your files contain the language R rather than C++ sources. And your editor will probably not understand that it's C or C++ in files called .b - and build tools such as Make, scons, CMake, etc will probably not understand how to compile the your files without being "told". [Compilers also look a the filename extension to determine if it's supposed to compile as C++ or C, which of course will not work with "unconventional names"]
What is important is not what the files are called, but what they actually contain. A header (what most people call something.h) file should be such that it can be included anywhere, and any number of times in your project [exceptions do exist, where header files are not really meant to be included more than a single time in the entire project - for example a version.h which declares a string that describes the current version number].
A source file (what is conventionally called something.cpp, typically, should be passed to the compiler directly to be compiled, and not used as #include "something.cpp". However, it is the CONTENT that determines this, not the name of the file. It's just badly named files if you use them that way.
In summary: The compiler just reads the source file passed in, then "inserts" the #include into the stream of code that it compiles, as if it was pasted into the original source file. The compiler doesn't really care what your file names are, where they came from, or what their content is, as long as the compiler is "ok" with the compilation as a whole.
There is no difference to include .cpp and .h files from point of view of compiler. But The content of .cpp and .h is different in common case. The .cpp files is for implementation of class, functions, static objects, and the .h files is for class definition. If you include the .cpp file into another .cpp file the content is duplicated and will fail at link stage becouse the naming collisions.
What's the difference between including a header(.h files) and C++ file(.cpp files)?
Supposed the .cpp file contains some function definitions, the latter option usually doesn't work well with commonly used build systems, and ends up with multiple definition errors, when the .cpp is included by more of one translation unit.
The exception might be having inlined all of your function definitions in the .cpp file.
In the very principle it's that the C/C++ preprocessor just expands the text found in either file type into the current translation unit. The file extension doesn't play any role here.
There is no difference. Both are handled by the preprocessor as plain text for concatenation to a single file. However, including a source might have a undesirable result (multiple definitions of variables/functions). Header files are usually protected by inclusion guards (#ifndef HEADER_H or a #pragma once) to prevent duplication of their content.
Note: The compiler does it's work after preprocessing (or invokes the preprocessor before compiling).

c++ when to include cpp even if we have .h file

I am reading the book Absolute C++ 5th edition.
In page 716,
I don't really understand why it needs include "pfarray.cpp"
Is include "pfarray.h" not enough?
More specifically, even if we have declarations in .h file but implementations in .cpp files, when we still have to include .cpp file?
Thank you in advance.
You don't have to.
A translation unit is a group of files with definitions and declarations. Compiling a translation unit, the compiler needs to know everything about declarations and re-parse them again and again. The definitions on the other hand can be compiled just once and reused for another units.
A translation unit can be separated in .h and .cpp files. You should put the declarations in .h and definitions in .cpp files to obey one definition rule. This approach also reduces compilation time.
Writing template-d classes and functions (without specialization), some coders (a bad habit in my opinion) will put the implementations in the .cpp files and they have to include them at the end of its corresponding .h file or in a .cpp file which needs them. It's just confusing. A better naming convention is to rename these type of .cpp files to .impl.cpp and including them at the end of its .h file.
When you write #include anything.any_extension, the extension doesn't really matter to the preprocessor. It's really like a "take the contents from that file and paste it into this file" kind of brute mechanism. So you can include anything without error provided that the code inside is legit, and you can name a header file with any extension. So you can even name them with a .txt extension and it wouldn't really matter to the preprocessor.
I would suggest that the practice of including source files is rather confusing, mainly from a build standpoint since it's not very clear whether a source file (cpp, cc, etc) is supposed to be built as a separate object file to link against or #included or both.
Yet it's sometimes done anyways. For example, pfarray.cpp might contain an implementation for a template since templates typically need their full implementation visible at compile time at the site generating the code, and sometimes authors establish a habit of #including files with source file extensions to avoid putting the implementation details into the same header file while uniformly conforming to a style that favors putting all such details into files named with a source file convention.
Another reason this can be done, but I don't think it's the reason it was done in your case, is as a build optimization (see Unity builds). It can sometimes be more efficient to compile and link fewer files, so using #include for source files can be a crude way to fuse them all together into a single build target.

Does putting a whole class definition in a ".h" make the executable larger?

We define a C++ class in a .h and define its methods in a .cpp, but it makes the code look less organized.
I want to put all method's definition in the class definition which is in a .h file, but I'm worrying that the compiler generate duplicated code for the same methods/functions when one class header file is included by different files.
Does the linker find out and merge the duplicated code pieces to reduce the file size?
If not, is it better to use .hpp instead? I heard that a .hpp is for this.
And it does make minor difference when I just change a .h file for a .hpp (I don't know why), compiled with G++.
Yes. It may create larger executable and that is because the member functions which are defined in the class itself, are inline by default, whether you mention the keyword inline in the defintion or not. Usually, inline function causes larger executable because the compiler will define it multiple times wherever it is called from.
.h vs .hpp is the 90% equivalence of
#include <cmath> vs #include <math.h>
Some people prefer to use .hpp when they are doing exclusive C++ programming. You will see .hpp in libraries like Boost.
However, the other 10% is really important. For example, taking from Boost library doc, they explain the reason of using .hpp over .h:
Most Boost libraries are header-only: they consist entirely of header
files containing templates and inline functions, and require no
separately-compiled library binaries or special treatment when
linking.
If you fall in that case, you should use .hpp, but this can cost longer compilation time. Otherwise, you might want to keep .h style. That's just my personal taste. It isn't C-oriented at all, in my honest opinion.
Further reading:
Splitting templated C++ classes into .hpp/.cpp files--is it possible?
Condensing Declaration and Implementation into an HPP file
C++ templates declare in .h, define in .hpp
You have nothing to worry about. It makes absolutely no difference how it's broken up, it's what your files describe that makes it bigger, not how that description is spread out.
.h or .hpp makes no difference as well.
To answer your question about a larger executable, yes it will make your executable larger. When a you #include a header file in a source or header file, the preprocessor replaces the #include with the contents of the header file. This is why it is necessary to protect your header files with the following header protection:
#ifndef HDR_H
#define HDR_H
...
#endif
However, you will get linker errors if you include the header file (that has function definitions) in multiple files that are part of the same executable. It would wise for you to split class and function definitions and declarations into .cpp and .hpp files, respectively. This will greatly reduce the amount of linker headaches.
Also, .h = .hpp. Doesn't matter which one you choose. Personal preference...
There's all you need here: Header files, pros and cons of putting all you code in them. Hope it helps!
Using header files results in quicker compile time and smaller executable. It also looks considerably cleaner because you can get a quick overview of your class by looking at its .h declaration.

CPP | .h files (C++)

I was just wondering what the difference between .cpp and .h files is? What would I use a header file (.h) for and what would I use a cpp file for?
In general, and it really could be a lot less general:
.h (header) files are for declarations of things that are used many times, and are #included in other files
.cpp (implementation) files are for everything else, and are almost never #included
Technically, there is no difference. C++ allows you to put your code in any file, with any format, and it should work.
By convention, you put your declarations (basically, that which makes up your API) in the .h files, and are referred to as "headers". The .cpp files are for the actual "guts" of your code - the implementation details.
Normally, you have the header files included with #include by other files in your project (and other projects, if you're making a library), so the compiler can get the interface required to compile. The implementation, in the .cpp files, is typically implemented so there is one .cpp file "filling in" the implementation per .h file.
By convention, .h files is something that you #include. CPP files are something you add to your project for compiling into separate object file, and then passing to the linker.
The .h file is called the header file. You usually put your interface there (the stuff you want to be public). The cpp file is where you actually implement your interface.
First, both are text files that contain code for the C++ compiler or pre-processor. As far as the system is concerned there is no difference.
By convention different file name extensions are used to indicate the content of files. In C programs you tend to see .h and .c files while in C++ .hpp and .cpp serve the same purposes.
The first group, .h and .hpp files, called header files, contains mostly non-executing code such as definitions of constants and function prototypes. They are added to programs via #include directive and used not only by the program or library in question but by other programs or libraries that will make use of them, declaring interface points and contracts defining values. They are also used to set metadata that may change when compiling for different operating systems.
The second group, .c and .cpp files, contain the executing parts of the code for the library or program.
Correct me if I'm wrong but,
When you #include something, it more-or-less inserts the entire included file into the one with the include command; that is, when I include, say "macros.h" in "genericTools.cpp", the entire contents of "macros.h" is placed in "genericTools.cpp" at that point. This is why you need to use things like "#pragma once" or other protections, to prevent including the same file twice.
Of note, templated code needs to be entirely in the file you're going to be including elsewhere. (I'm unsure of this - can template specializations be ommited from the included files, and linked like a normal function?)
The .cpp that is the implementation file is our actual program or code.
When we need to use different inbuilt functions in our code, we must include the header file that is .h files.
These .h files contains the actual code of the inbuilt functions that we use hence we can simply call the respective functions.
Therefore, while we compile our code we can see more number of lines compiled than what we have actually coded because not only our code is compiled but along with that the (code of the) functions (that are included in .h files) are also compiled.