compilation process in C++

compilation process in C++ - c++

I will be very grateful, if somebody can actually explain what exactly my compiler does when I press button BUILD, and compiler begins to compile all my .h and .cpp files how exactly this process is going on(what do I have inside object file?), why do I ask such question? I'm trying to understand what does it mean "minimize compilation dependencies between files" from book of Meyers about 50 specific ways...(hope You know about this book),there he explains what does it mean Abstract Base Class and Handle Classes, as my lecturer explained me I don't need to include excessive .h files and thats all, any links about compilation process will be appreciated as well, thanks in advance for any help

When doing a full compile, the compiler will read each .cpp file in turn. For a given .cpp file it will then read every file referenced by a #include directive, recursively, compiling the code as it goes. When it compiles the next source file it will read the files referenced with #include in that source file.
When you make any changes and do a build, then if any of the files referenced by a #include directive from your .cpp file have changed then the .cpp file will be recompiled, as if the .cpp file itself has changed.
Unnecessary #include directives thus have two costs: firstly the compiler has to read and process more files when compiling, and secondly it increases the chances that your .cpp file will need recompiling even if nothing it actually uses has changed.

See
http://computer.howstuffworks.com/c2.htm
for a introduction and
http://www.tenouk.com/ModuleW.html
for an in depth descirption
Additionally, some theoretic background can be found at
http://en.wikipedia.org/wiki/Compiler

The best way to understand how a compiler works is to first understand how an assembler works. There is a decent explanation here.

Related

Why do we include header files and not source files?

I've seen similar questions asked yet they still do not make sense to my ape brain.
Here is an example. If I declared a function in a header file named Bob.h: void PrintSomething(); and in the .cpp file I say: void MyClass::PrintSomething(){std::cout << "Hello";} . I've seen people in another .cpp file for example Frank.cpp, only include the Bob.h header which just has the declaration (No code inside it) and not the .cpp with the code but then what blows my mind is when they call the PrintSomething() function in Frank.cpp it uses the code from Bob.cpp and prints "Hello". How? How does it print "Hello" which was added in the .cpp file when I've only included the .h file which doesn't say anything about "Hello", its just a declaration? I've looked through the compile process and linking process too but it just doesn't stick.
On top of which if I were to now say in my Frank.cpp file: void MyClass::PrintSomething(){std::cout << "Bye";} and included the Bob.h file in my main.cpp and called the PrintSomething() function would it print "Hello" or "Bye"? Is the computer psychic or something? This concept is the one thing I am not grasping in my C++ learning journey.
Thanks in advance.

The moment you include Bob.h the compiler has everything it needs to know about PrintSomething(), it only need a declaration of the function. Frank.cpp does not need to know about Bob.cpp which defines PrintSomething().
All of your individual cpp files output object files generated by the compiler. These in themselves don't do much until they're all glued together, this is the linker's responsibility.
The linker takes all your object files and fills in the missing parts:
Linker talk:
Hey, I see that Frank.obj uses PrintSomething() and I can't see
its definition in that object file.
Let's check the other object files..
Upon inspecting Bob.obj I can see that this contains a usable
definition for PrintSomething(), let's use that.
This is of course simplified but that's what a linker does in short.
After this is done you get your usable executable.
on top of which if I were to now say in my Frank.cpp file: void MyClass::PrintSomething(){std::cout << "Bye";} and included the Bob.h
file in my main.cpp and called the PrintSomething() function would it
print "Hello" or "Bye"? Is the computer psychic or something?
The linker would find 2 definitions of PrintSomething() and would emit an error, it has no way to know what definition is the right one to pick.

The key notion here is separate compilation. You divide your project into a set of source files that implement more-or-less independent things, you compile those source files into object files, and you link the object files and any additional libraries (including the standard library) to create an executable file. For large projects, compiling all of the source files can take a long time (sometimes measured in hours). The first time you build your application you have to do that. But after that, if you only changed one source file you only need to recompile that source file and then link again, which the object files that you created the first time through. That's usually a big time saver. If you have one massive source file (i.e., a source file that #includes all the rest of your source files), you don't get that option -- you have to recompile the whole thing every time.

What's the REAL difference between .h and .cpp files?

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?

Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.

What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.

Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.

The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.

Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

Difference in including the .cpp file and .h file (with the same content in cpp)?

I've recently started learning cpp from basics and was very much confused with the folowing:
Lets say I have a header( test.h which contains only declarations) with some content and some source file (source.cpp) and program produced some result.
If I have copied the same content of that header file to a .cpp file (testcpp.cpp) and included this in source.cpp
In this case, I did not understood what difference it makes?
(I'll not include this testcpp.cpp in make file)
I have seen some threads similar to this but couldn't get a clear idea!!!
I learnt the usage of header and cpp files and have used it correctly in projects till now, Please answer specific to this scenario (I know doing this way adds confusion but just want to know). Will there be any difference doing so or it's just a common practice everyone follows ?

what difference it makes?
The extension of a header file has no effect on anything. You could have just as well named the file test.mpg, .test or just test (changing the include directive obviously), and it would have worked just as well. The extension is for the benefit of the programmer, not the toolchain.
However, it is a bad idea to name it anything other than .h, .hpp or whatever is your convention. If you name it .mpg, people will think that it is a video, and not realising that it is a header file, try to play it in a media player. If you name it .cpp, people will think that it is a source file and may attempt to compile it or maybe add definitions into it.
Including a file with the preprocessor is technically just copying contents of one file into another. Nothing more and nothing less. Everything else about them is just convention.
In makefile, when specifying source file, Can I give my source files with any extension(.fsfs, .xxx) rather than .cpp extension
Technically yes, however compilers usually use the source file extension to detect the language which they will fail to do in this case, so you would have to specify it explicitly.

It changes nothing. It's just a convention whether you use a *.h or *.cpp or *.asdasd suffix, as long as it doesn't get compiled by itself.
Some projects use the .hxx extension for header files and .cc for source file.
Please, for the good of fellow programmers you'll work with, stick to common conventions and don't put header code in .cpp files.

#include just does a copy-n-paste of the file you include into the current file. What the file is named doesn't matter one bit - you can name it "foo.exe" if you like; as long as it contains valid source-code in the context where it is included all is well (but please don't use unconventional names, you'll just confuse people).

c++: #include and different file types

I somehow can't grab the idea and reading the documentation hasn't helped me.
My questions are:
When I include a header file #include "general.h", and in the directory of my project there are two files general.h and general.cpp, does it mean that I the precompiler will find the .cpp file automatically?
Can I include files without extentions: #include "general"?
Can I include a file without any header file: #include "general.cpp"?
Can I include a txt file: #include "general.txt"?
I tried this all in Visual Studio 2010. No syntax errors at least. But I'd like to have an explanation. So, I hope you will be kind and help me.

The standard and the compiler don't really care much about whether a file is .cpp or .h or .monkeyface. The concepts behind structuring your source code into implementation and header files are really just accepted ways to help manage your source. Despite this, not structuring your source in the accepted way is often considered to be incorrect or bad C++.
All #include does is tell the preprocessor to include the contents of the file you specify in the current file. It's like copying and pasting the other file into yours. When you say #include "foo.h", it just includes the contents of foo.h and doesn't care about foo.cpp at all - it doesn't even know that it exists (and there's no reason it has to exist).
Structuring your source code in implementation and header files is extremely useful - it avoids problems with dependencies and multiple definitions, and also improves compilation time somewhat. When your code uses another class, you only need to #include the header file for that class. The reason is because your code doesn't need to care about the implementation of the class, it just needs to know what it looks like (its name, members, base class, etc.). It doesn't concern itself with how exactly the member functions are implemented.
The extensions .cpp and .h are merely conventions. Some people prefer to use .hpp for header files. Some people even use .tpp for template implementations. You can name them however you like - yes, you can even include a .txt file. Your compiler probably tries to infer things about files (for example, which language to compile it as) from the file extension, but that is usually overrideable.
So if your main.cpp includes foo.h because it uses class foo, at what point does foo.cpp get involved? Well, in the compilation of main.cpp, it doesn't get involved at all. main.cpp doesn't need to know about the implementation of the class, as we discussed above. However, when compiling your entire program, you will pass each of your .cpp files to the compiler to be compiled separately. That is, you would do something like g++ main.cpp foo.cpp. When foo.cpp is compiled, it will include the headers that it needs to compile.
After each of your .cpp files has been compiled (which involves including the headers that they depend on), they are then linked together. The use of a member function foo::bar() in main.cpp will at this stage be linked to the implementation of foo::bar() that was given in foo.cpp.

The #include directive tells the preprocessor to read the file. That's all.

The preprocessor simply inserts the whole content of the given file when it encounters a #include directive.

No, the precompiler know nothing about the .cpp file
Yes, if the file has no extensions
You can include any file you want. It doesn't mean you will get anything useful out of it.
See point 3. above.

#include is a simple "insert the contents of the given file here" mechanism, so the preprocessor will include exactly the file you specify. If you include a .h file, neither the preprocessor nor the compiler will know about the corresponding .cpp file - each .cpp file is compiled separately (the purpose of the .h files is to inform the compiler of which functions exist outside of the current .cpp file). After compilation, the linker is invoked, and only then are the compiled results of the different .cpp files combined.

What is a .h.gch file?

I recently had a class project where I had to make a program with G++.
I used a makefile and for some reason it occasionally left a .h.gch file behind.
Sometimes, this didn't affect the compilation, but every so often it would result in the compiler issuing an error for an issue which had been fixed or which did not make sense.
I have two questions:
1) What is a .h.gch file and what is one used for? and
2) Why would it cause such problems when it wasn't cleaned up?

A .gch file is a precompiled header.
If a .gch is not found then the normal header files will be used.
However, if your project is set to generate pre-compiled headers it will make them if they don’t exist and use them in the next build.
Sometimes the *.h.gch will get corrupted or contain outdated information, so deleting that file and compiling it again should fix it.

If you want to know about a file, simply type on terminal
file filename
file a.h.gch gives:
GCC precompiled header (version 013) for C

Its a GCC precompiled header.
Wikipedia has a half decent explanation, http://en.wikipedia.org/wiki/Precompiled_header

Other answers are completely accurate with regard to what a gch file is. However, context (in this case, a beginner using g++) is everything. In this context, there are two rules:
Never, ever, ever put a .h file on a g++ compile line. Only .cpp files. If a .h file is ever compiled accidentally, remove any *.gch files
Never, ever, ever put a .cpp file in an #include statement.
If rule one is broken, at some point the problem described in the question will occur.
If rule two is broken, at some point the linker will complain about multiply-defined symbols.

a) They're precompiled headers:
http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
b) They contain "cached" information from .h files and should be updated every time you change respective .h file. If it doesn't happen - you have wrong dependencies set in your project

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js