What is the conventions for headers and cpp files in C++? - c++

In C++, what is the convention for including headers for class files in the "main" file. e.g.
myclass.h
class MyClass {
doSomething();
}
myclass.cpp
doSomething() {
cout << "doing something";
}
run.cpp
#include "myclass.h"
#include "myclass.cpp"
etc..
Is this relatively standard?

You don't include the .cpp file, only the .h file. The function definitions in the .cpp will be compiled to .obj files, which will then be linked into the final binary. If you include the .cpp file in other .cpp files, you will get two different .obj files with the same funciton definition compiled, which will lead to linker error.

See Understanding C Compilers for a lot of good answers to this question.

You can say one .cpp file and all its included headers make up one translation unit. As the name implies, one translation unit is compiled on its own. The result, often called file.o or file.obj, of each translation unit, is then linked together by the linker, fixing up yet unresolved references. So in your case you have
Translation Unit 1 = run.cpp: myclass.h ...
Translation Unit 2 = myclass.cpp: myclass.h ...
You will have your class definition appear in both translation units. But that's OK. It's allowed, as long as both classes are equally defined. But it's not allowed to have the same function appear in the two translation units if the function is not inline. Not inline functions are allowed to be defined only once, in one single translation unit. Then, you have the linker take the result of each translation unit, and bind them together to an executable:
Executable = mystuff: run.o myclass.o ...

usually you compile the .cpp file separately and link the resulting .o with other .o's
So myclass.cpp would include myclass.h and would be compiled as a unit.

You compile cpp files separately. If you include any given cpp file into two or more cpp files yoy might encounter a conflict during linking phase.

You don't include one *.cpp inside another *.cpp. Instead:
myclass.h
class MyClass {
doSomething();
}
myclass.cpp
#include "myclass.h"
MyClass::doSomething() {
cout << "doing something";
}
run.cpp
#include "myclass.h"
etc..
Instead of including myclass.cpp inside main.cpp (such that the compiler would see both of them in one pass), you compile myclass.cpp and main.cpp separately, and then let the 'linker' combine them into one executable.

Related

the #include directive during the linking stage

A header file usually has some safe guard using the #ifndef directives(or similar) e.g:
//header.hpp
#ifndef HEADER
#define HEADER
//code
#endif
but, I have a confusion here, what if we do the following(consider the two file's source codes):
//file1.cpp
#include "header.hpp"
//somecode
and the file
//file2.cpp
#include "header.hpp"
//somecode
if we did something like this:
g++ file1.cpp file2.cpp -o mainfile
we'd get a single executable that would get a single executable with no duplication since the includes are checked at compile time.
But, what if we do:
g++ -c file1.cpp -o file1.o
g++ -c file2.cpp -o file2.o
g++ file1.o file2.o -o mainfile.o
What happens during the linking stage? Will the includes have conflict? What happens to the includes during the compile time? Does it get duplicated? What is the mechanism under the hood to deal with this at this stage?
Normally a header file only contains function and variable declarations, and not their definitions. It's the definitions which are processed by the linking stage, so one can include a header many times in different source files and the linker just won't see anything. If you do have a global function or variable definition in a header file, you will get a linker error.
The formal term here is "Translation Unit". That's what you call a single .cpp file with all the headers included by it. Preprocessor definitions do not span Translation Units, and you have two Translation Units here. The linking process is what combines Tranlation Units, but at that phase the preprocessor is long done.
Both are exactly the same. The Guard doesn't go to the next file. Every file is compiled with new #define - state.
#include is actually a preprocessor-directive, which means, they are resolved before compilation. The preprocessor processes every translation unit seperately and two translation units do not influence each other. The results of this step don't contain any preprocessor statements (like #include, #ifdef, #define, etc).
So after preprocessing, both files, file1.cpp and file2.cpp, contain the contents of header.hpp. Then both are compiled to file1.o and file2.o. No problems so far. Here comes the importance of include guards. Compilation will fail if a translation unit contains duplicate declarations.
Imagine you got a header1.hpp:
#include "header.hpp"
class ABC { ... };
And header2.hpp:
#include "header.hpp"
class XYZ { ... };
And some file, say, file3.cpp would rely on both:
#include "header1.hpp"
#include "header2.hpp"
class Foo : pulbic ABC, public XYZ {};
Without include guards you end end with including header.hpp twice and get all the declarations twice in the translation unit, which doesn't compile. (We are only looking at file3.cpp here). With include guards header.hpp is only included once.
Now we finally reached the linking stage and come back to your original example. You have 2 compilation units, that both contain all the decalrations from header.hpp. The linker will not care about duplicate declarations. The linker will only fail if it finds multiple definitions of a symbol.
The #includes are not "checked" or "ignored" at the linking stage, they simply don't exist any more.

How does the C++ compiler (or Linker?) knows how to handle cpp and header class files?

For example, I have a class Foo. I create Foo.h, Foo.cpp and then I include Foo.h in the main.cpp file. When I compile the code, how does the machine know to associate the class header file and the class cpp file? Is it done by the filenames?
I'm really interested in understanding this process of compilation and linking.
When i compile the code how does the machine know to associate the class header file and the class cpp file? is it doing it by the files name?
No, there's no such kind of automatic association done by the compiler.
If you have a header file containing all the declarations of functions and classes, it must be #included from any translation unit (.cpp file), that makes use of it.
That step (of declaration contracts) is done by the c-preprocessor where every occurrence of #include "MyDeclarations.hpp" replaces that with the complete file content of MyDeclarations.hpp in the translation unit.
A simple example:
Foo.hpp
class Foo {
public:
Foo(); // Constructor declaration
};
Foo.cpp
#include "Foo.hpp" // <<<< Include declarations
Foo::Foo() {} // Constructor definition
main.cpp
#include "Foo.hpp" // <<<< Include declarations
int main() {
Foo foo; // <<<<< Use declarations
}
To finally instruct your linker to stich all of these files together you have to refer to the artifacts produced from the translation units. Depending on the toolchain a bit, but for e.g. GCC you may use some compiler command line like
$ g++ main.cpp Foo.cpp -o myProg

Why "int i" has multiple definitions?

I have two files as below:
Test1.h
#ifndef TEST_H
#define TEST_H
int i = 10;
#endif
Test2.cpp
#include <iostream>
#include "Test1.h"
int main()
{
std::cout << i << std::endl;
}
I know I can solve this by using extern or const in Test1.h.
But my question is that "I don't understand the error".
error LNK2005: "int i" (?i##3HA) already defined in Test1.obj
error LNK1169: one or more multiply defined symbols found
How can int i have multiple definitions?
The header file has include guards.
When I include the header file it should mean that everything gets copied in Test2.cppand it should become:
Test2.cpp
#include <iostream>
int i = 10
int main()
{
std::cout << i << std::endl;
}
And header file should become irrelevant at this point after everything being included.
My other question is if I declare int i with extern in header file and include it in .cpp, then would it be an example of external linkage? Because generally I have seen external linkage between two .c or .cpp as in here but if you explicitly include the file is it still regarded as i having external linkage?
Each compilation unit (a .cpp file) produces its own set of symbols individually which are then linked together by the linker.
A header file "becomes" part of the compilation unit it is included in, which compile to an object file (.obj in Windows, .o in Unix systems)
Therefore it is like you have defined a global 'i' in each compilation unit.
The correct solution (as you know, if you have to have a global) is to declare it as "extern" in the header then have one compilation unit actually define it.
Include guards only prevent the same header being included twice in the same compilation unit, which can happen if I include and and one of those includes the other.
How can int i have multiple definitions?
The file that has the definition was included in multiple translation units (cpp file). One unit was compiled into the object file Test1.obj. The source of the other unit is shown in your answer (Test2.cpp). The error is shown when you try to link the object files together.
The header file has include guards.
That prevents the contents of the file from being repeated within a single translation unit. It makes no difference to separate units.
My other question is if I declare int i with extern in header file and include it in .cpp, then would it be an example of external linkage?
extern makes the linkage external explicitly. But even without extern, variables declared in the namespace scope have implicit external linkage by default (there are exceptions). The difference in this case is that extern variable declarations are not definitions unless there is an initializer.
I can achieve external linkage without including header file i.e. with two .cpp files by making a variable extern in one .cpp and defining it in other and linker finds its definition. But if I have one header file with extern variable and include it in other .cpp does this count as external linkage?
It does not matter how the extern declaration ends up in the cpp file. Whether it was included from a header or not, it declares a variable with external linkage.
Probably you are trying to create the executable file from two unit of translations.
Your error shows that the object have been defined in Test1.obj. Probably, your program is Test1.obj+Test2.obj, and both files include the same definition, which has external linkage.
Do you have other Test1.cpp in your project that also include the Test1.h ?
If not, do you do any config to your compiler so it also build the .h files to object files ?
The reason can just be the answer of one of two questions above.
When I include the header file it should mean that everything gets copied in Test2.cpp and it should become:
Yes and then you do the exact same thing in Test1.cpp (which you didn't show us).
Hence, multiple definitions.

Initialize static member in C++

From what I have understood, the reason you initialize a static member in a .cpp file and not in a .h is so there's no risk to get several instances of the member.Take this example then:
//Foo.h
#ifndef FOO_H
#define FOO_H
class Foo{
static int a;
};
int Foo::a = 95;
#endif
The preprocessor directives make sure that this .h file is only compiled once, which ensures there is only one instance of the static member. Is this possible to do instead of initiate the static member in a .cpp file?
No, it only assures that Foo.h is included once per compilation unit (.cpp file). Not in the entire project. You should define the static member within Foo.cpp
Consider having two source code files, a.cpp and b.cpp, that both include the header. Since they're compiled independently of each other, the header guard will not work, you will end up with two object files a.o and b.o that both define Foo:a. Trying to link them together will fail.
This will cause a linker error if the header is included in multiple .cpp files (translation units):
//a.cpp
#include <Foo.h>
//b.cpp
#include <Foo.h>
After compilation, a.obj contains a definition of Foo::a and b.obj contains a definition of Foo::b. If an attempt is made to link these two .obj files into a single binary a multiple definition error will occur.
No, the include guards ensure that the header is included at most once per compilation unit. If your program has multiple compilation units (.cpp files) including the header then you will end up with multiple definitions for Foo::a.

Why use #ifndef CLASS_H and #define CLASS_H in .h file but not in .cpp?

I have always seen people write
class.h
#ifndef CLASS_H
#define CLASS_H
//blah blah blah
#endif
The question is, why don't they also do that for the .cpp file that contain definitions for class functions?
Let's say I have main.cpp, and main.cpp includes class.h. The class.h file does not include anything, so how does main.cpp know what is in the class.cpp?
First, to address your first inquiry:
When you see this in .h file:
#ifndef FILE_H
#define FILE_H
/* ... Declarations etc here ... */
#endif
This is a preprocessor technique of preventing a header file from being included multiple times, which can be problematic for various reasons. During compilation of your project, each .cpp file (usually) is compiled. In simple terms, this means the compiler will take your .cpp file, open any files #included by it, concatenate them all into one massive text file, and then perform syntax analysis and finally it will convert it to some intermediate code, optimize/perform other tasks, and finally generate the assembly output for the target architecture. Because of this, if a file is #included multiple times under one .cpp file, the compiler will append its file contents twice, so if there are definitions within that file, you will get a compiler error telling you that you redefined a variable. When the file is processed by the preprocessor step in the compilation process, the first time its contents are reached the first two lines will check if FILE_H has been defined for the preprocessor. If not, it will define FILE_H and continue processing the code between it and the #endif directive. The next time that file's contents are seen by the preprocessor, the check against FILE_H will be false, so it will immediately scan down to the #endif and continue after it. This prevents redefinition errors.
And to address your second concern:
In C++ programming as a general practice we separate development into two file types. One is with an extension of .h and we call this a "header file." They usually provide a declaration of functions, classes, structs, global variables, typedefs, preprocessing macros and definitions, etc. Basically, they just provide you with information about your code. Then we have the .cpp extension which we call a "code file." This will provide definitions for those functions, class members, any struct members that need definitions, global variables, etc. So the .h file declares code, and the .cpp file implements that declaration. For this reason, we generally during compilation compile each .cpp file into an object and then link those objects (because you almost never see one .cpp file include another .cpp file).
How these externals are resolved is a job for the linker. When your compiler processes main.cpp, it gets declarations for the code in class.cpp by including class.h. It only needs to know what these functions or variables look like (which is what a declaration gives you). So it compiles your main.cpp file into some object file (call it main.obj). Similarly, class.cpp is compiled into a class.obj file. To produce the final executable, a linker is invoked to link those two object files together. For any unresolved external variables or functions, the compiler will place a stub where the access happens. The linker will then take this stub and look for the code or variable in another listed object file, and if it's found, it combines the code from the two object files into an output file and replaces the stub with the final location of the function or variable. This way, your code in main.cpp can call functions and use variables in class.cpp IF AND ONLY IF THEY ARE DECLARED IN class.h.
I hope this was helpful.
The CLASS_H is an include guard; it's used to avoid the same header file being included multiple times (via different routes) within the same CPP file (or, more accurately, the same translation unit), which would lead to multiple-definition errors.
Include guards aren't needed on CPP files because, by definition, the contents of the CPP file are only read once.
You seem to have interpreted the include guards as having the same function as import statements in other languages (such as Java); that's not the case, however. The #include itself is roughly equivalent to the import in other languages.
It doesn't - at least during the compilation phase.
The translation of a c++ program from source code to machine code is performed in three phases:
Preprocessing - The Preprocessor parses all source code for lines beginning with # and executes the directives. In your case, the contents of your file class.h is inserted in place of the line #include "class.h. Since you might be includein your header file in several places, the #ifndef clauses avoid duplicate declaration-errors, since the preprocessor directive is undefined only the first time the header file is included.
Compilation - The Compiler does now translate all preprocessed source code files to binary object files.
Linking - The Linker links (hence the name) together the object files. A reference to your class or one of its methods (which should be declared in class.h and defined in class.cpp) is resolved to the respective offset in one of the object files. I write 'one of your object files' since your class does not need to be defined in a file named class.cpp, it might be in a library which is linked to your project.
In summary, the declarations can be shared through a header file, while the mapping of declarations to definitions is done by the linker.
That's the distinction between declaration and definition. Header files typically include just the declaration, and the source file contains the definition.
In order to use something you only need to know it's declaration not it's definition. Only the linker needs to know the definition.
So this is why you will include a header file inside one or more source files but you won't include a source file inside another.
Also you mean #include and not import.
That's done for header files so that the contents only appear once in each preprocessed source file, even if it's included more than once (usually because it's included from other header files). The first time it's included, the symbol CLASS_H (known as an include guard) hasn't been defined yet, so all the contents of the file are included. Doing this defines the symbol, so if it's included again, the contents of the file (inside the #ifndef/#endif block) are skipped.
There's no need to do this for the source file itself since (normally) that's not included by any other files.
For your last question, class.h should contain the definition of the class, and declarations of all its members, associated functions, and whatever else, so that any file that includes it has enough information to use the class. The implementations of the functions can go in a separate source file; you only need the declarations to call them.
main.cpp doesn't have to know what is in class.cpp. It just has to know the declarations of the functions/classes that it goes to use, and these declarations are in class.h.
The linker links between the places where the functions/classes declared in class.h are used and their implementations in class.cpp
.cpp files are not included (using #include) into other files. Therefore they don't need include guarding. Main.cpp will know the names and signatures of the class that you have implemented in class.cpp only because you have specified all that in class.h - this is the purpose of a header file. (It is up to you to make sure that class.h accurately describes the code you implement in class.cpp.) The executable code in class.cpp will be made available to the executable code in main.cpp thanks to the efforts of the linker.
It is generally expected that modules of code such as .cpp files are compiled once and linked to in multiple projects, to avoid unnecessary repetitive compilation of logic. For example, g++ -o class.cpp would produce class.o which you could then link from multiple projects to using g++ main.cpp class.o.
We could use #include as our linker, as you seem to be implying, but that would just be silly when we know how to link properly using our compiler with less keystrokes and less wasteful repetition of compilation, rather than our code with more keystrokes and more wasteful repetition of compilation...
The header files are still required to be included into each of the multiple projects, however, because this provides the interface for each module. Without these headers the compiler wouldn't know about any of the symbols introduced by the .o files.
It is important to realise that the header files are what introduce the definitions of symbols for those modules; once that is realised then it makes sense that multiple inclusions could cause redefinitions of symbols (which causes errors), so we use include guards to prevent such redefinitions.
its because of Headerfiles define what the class contains (Members, data-structures) and cpp files implement it.
And of course, the main reason for this is that you could include one .h File multiple times in other .h files, but this would result in multiple definitions of a class, which is invalid.