Is there any reason to prefer linker commands over include directives if you don't plan on recompiling the included files separately?
P.S. If it matters, I'm actually concerned with C++ and g++, but I thought gcc would be more recognizable as a generic compiler.
Is there any reason to prefer linker commands over include directives
Yes. You'll get into serious trouble if you include implementation (.c) files here and there. Meet the infamous "Multiple definitions of symbol _MyFunc" linker error...
(By the way, it's also considered bad style/practice, in general, only header files are meant to be included.)
If you really want to just have one long C file, use your editor to insert file2.c into file1.c and then delete file2.c. If they ALWAYS go together, then that's (possibly) the right solution. Using #include for this is not the right solution.
The reason we split files into separate .c anc .cpp files is that they logically do something separate from the rest of the code. Compiling each unit separately is a good idea when programs are large, but the main reason for splitting things into separate files is to show the independence of each unit of code. This way, you can see what other parts affect this particular file (looking at the headers that are included). If a class is local to a .cpp file, you know that class isn't used somewhere else in the system, so you can safely change the internals of that class without having to worry about other components being affected, for example. On the other hand, if everything is in one large file, then it's very hard to follow what's affecting what, and what is safe to change.
Here's the difference. file1.c:
#include <stdio.h>
static int foo = 37;
int main() { printf("%d\n", foo); }
file2.c:
static int foo = 42;
These two trivial modules compile fine with gcc file1.c file2.c, even though file2.c's definition of foo is then never used. static identifiers are visible only within a translation unit (C's version of what is more commonly called a module).
When you #include "file2.c" in file1.c, you effectively insert file2.c into file1.c, causing an identifier clash before the two files now become one translation unit.
As a rule, never #include a C or C++ source file. Only #include headers.
Related
This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?
Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.
What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.
Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.
The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.
Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.
I am trying to understand ODR.
I created one file pr1.cpp like this:
struct S{
int a;
};
a second file pr2.cpp like this
struct S {
char a;
};
and a main file like this :
#include <iostream>
int main() {
return 0;
}
I am compiling using terminal with the command :
g++ -Wall -Wextra pr1.cpp pr2.cpp main.cpp -o mypr
The compiler does not find any kind of error BUT there are two declarations of the type "S"...I am not understanding what is really happening..I thought to get an error after the "linkage" phase because of the ODR violation..
I can get the error only editing the main.cpp file adding :
#include "pr1.cpp"
#include "pr2.cpp"
Can anyone exmplain me what is happening?
Unless you include both definitions in the same file, there is no problem. This is because the compiler operates on a single translation unit which is usually a .cpp file. Everything you #include in this file is also part of the translation unit because the preprocessor basically copies and pastes the contents of all included files.
What happens is that the compiler will create and object file (.obj usually) for each translation unit and then the linker will create a single executable (or .dll etc) by linking all the object files and the libraries the project depends on. In your case the compiler encountered each struct in a different translation unit so it doesn't see a problem. When you include both files, the two definitions now find themselves in the same translation unit and the compiler throws an error because it cannot resolve the ambiguity if an S is used in this translation unit (even though you don't have to use one for the program to be ill-formed).
As a side-note, do not include .cpp files in other .cpp files. I'm sure you can find a lot on how to organize your code in header and source files and it doesn't directly answer the question so I won't expand on it.
EDIT: I neglected to say why you didn't get a linker error. Some comments have pointed out that this is undefined behavior which means that even though your linker should probably complain it doesn't actually have to. In your case you have one .obj file for each struct and a main.obj. None of these references the other so the linker does not see any references that it needs to resolve and it probably doesn't bother checking for ambiguous symbols.
I assume most linkers would throw an error if you declared struct S; and tried to use a S* or S& (an actual S would require definition inside the same translation unit). That is because the linker would need to resolve that symbol and it would find two matching definitions. Given that this is undefined, though, a standard-compliant linker could just pick one and silently link your program into something nonsensical because you meant to use the other. This can be especially dangerous for structs that get passed around from one .cpp to the other as the definition needs to be consistent. It might also be a problem when identically named structs/classes are passed through library boundaries. Always avoid duplicating names for these reasons.
We define a C++ class in a .h and define its methods in a .cpp, but it makes the code look less organized.
I want to put all method's definition in the class definition which is in a .h file, but I'm worrying that the compiler generate duplicated code for the same methods/functions when one class header file is included by different files.
Does the linker find out and merge the duplicated code pieces to reduce the file size?
If not, is it better to use .hpp instead? I heard that a .hpp is for this.
And it does make minor difference when I just change a .h file for a .hpp (I don't know why), compiled with G++.
Yes. It may create larger executable and that is because the member functions which are defined in the class itself, are inline by default, whether you mention the keyword inline in the defintion or not. Usually, inline function causes larger executable because the compiler will define it multiple times wherever it is called from.
.h vs .hpp is the 90% equivalence of
#include <cmath> vs #include <math.h>
Some people prefer to use .hpp when they are doing exclusive C++ programming. You will see .hpp in libraries like Boost.
However, the other 10% is really important. For example, taking from Boost library doc, they explain the reason of using .hpp over .h:
Most Boost libraries are header-only: they consist entirely of header
files containing templates and inline functions, and require no
separately-compiled library binaries or special treatment when
linking.
If you fall in that case, you should use .hpp, but this can cost longer compilation time. Otherwise, you might want to keep .h style. That's just my personal taste. It isn't C-oriented at all, in my honest opinion.
Further reading:
Splitting templated C++ classes into .hpp/.cpp files--is it possible?
Condensing Declaration and Implementation into an HPP file
C++ templates declare in .h, define in .hpp
You have nothing to worry about. It makes absolutely no difference how it's broken up, it's what your files describe that makes it bigger, not how that description is spread out.
.h or .hpp makes no difference as well.
To answer your question about a larger executable, yes it will make your executable larger. When a you #include a header file in a source or header file, the preprocessor replaces the #include with the contents of the header file. This is why it is necessary to protect your header files with the following header protection:
#ifndef HDR_H
#define HDR_H
...
#endif
However, you will get linker errors if you include the header file (that has function definitions) in multiple files that are part of the same executable. It would wise for you to split class and function definitions and declarations into .cpp and .hpp files, respectively. This will greatly reduce the amount of linker headaches.
Also, .h = .hpp. Doesn't matter which one you choose. Personal preference...
There's all you need here: Header files, pros and cons of putting all you code in them. Hope it helps!
Using header files results in quicker compile time and smaller executable. It also looks considerably cleaner because you can get a quick overview of your class by looking at its .h declaration.
I learned that if I compile main.cpp the compiler simply replaces all includes with the actual content of the file i.e. #include "LongClassName.h" with the text in that file. This is done recursively in LongClassName.h. In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
But it seems to be much more complicated in real projects. I had a look at the Makefile Eclipse created for my Qt project and it seems that there is an entry for every file named file.o and its dependencies are file.cpp and file.h. So that means that eclipse compiles each .cpp separately(?)
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
I stumbled upon this problem while trying to create an alias for a long class name. It is my main class and I wanted to call static functions with a shorter name: Ln::globalFunction() instead of LongClassName::globalFunction()
I have a class LongClassName whose header I include in main.cpp. This is the main class. All other classes are included in it.
LongClassName.h
#define PI 3.14159265
#include <QDebug>
Class LongClassName
{
...
public:
...
private:
...
};
typedef LongClassName Ln;
LongClassName.cpp
#include "Class1.h"
#include "Class2.h"
#include "Class3.h"
/*implementations of LongClassName's functions*/
So I assumed that when the code is included in one single "virtual" file by the compiler every class will be inserted after this source code and because of that every class should know that Ln is an alias for LongClassName
This didn't work
So what is the best way to propagate this alias to all classes?
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
(At the moment I have a seperate class Ln but try to merge it with LongClassName because it seems more logical.)
The compiler knows how to compile a .cpp file (if it's a cpp compiler) into a .o file called 'object file', which is your code translated (and probably manipulated, optimized, etc.) to a machine code. Actually the compiler creates an assembly code, which is translated to machine code by the assembler.
So each cpp file is compiled to a different object file, and knows nothing about variables declared in other cpp files, unless you include declarations you want the object file to know about, either in the cpp file or in an h file it includes.
Although the compilation is done separately for each cpp, the linker links all object files to a single executable (or a library), so a variable declared in the global namespace is indeed global, and every declaration not explicitly placed in a named
namespace is placed in the global namespace.
You will probably benefit from reading about all stages of "compiling", for example here: http://www.network-theory.co.uk/docs/gccintro/gccintro_83.html
In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
This is wrong. In .cpps you should include just the .hs (or .hpps if you like), almost never the .cpps; the .h in general just contain the declarations of the classes and of the methods, and not their actual body1 (i.e. their definition), so when you compile each .cpp the compiler still knows nothing about the definition of the functions defined in other .cpps, it just knows their declaration, and with it it can perform syntactical checks, generate code for function calls, ... but still it will generate an "incomplete" object file (.o), that will contain several "placeholders" ("here goes the address of this function defined somewhere else" "here goes the address of this extern variable" and so on)
After all the object files have been generated, it's the linker that have to take care of these placeholders, by plumbing all the object files together and linking their references to the actual code (which now can be found, since we have all the object files).
For some more info about the classical compile+link model, see here.
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
Yes, it's exactly like that.
But why doesn't the Makefile created by eclipse simply compile main.cpp. Why isn't this enough? main.cpp contains all the dependencies. Why compile every .cpp separately?
main.cpp doesn't contain all the code, but just the declarations. You don't include all the code in the same .cpp (e.g. by including the other .cpps) mainly to decrease compilation time.
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
If you use header guards, you shouldn't have problems.
1. Ok, they also contain inline and template functions, but they are the exception, not the rule.
One of my "non-programmer" friends recently decided to make a C++ program to solve a complicated mechanical problem.
He wrote each function in a separate .cpp file, then included them all in the main source file, something like this:
main.cpp:
#include "function1.cpp"
#include "function2.cpp"
...
int main()
{
...
}
He then compiled the code, with a single gcc line:
g++ main.cpp // took about 2 seconds
Now, I know that this should work, but I'm not sure whether including .cpp files directly into the main program is a good idea. I have seen the following scheme several times, where all the function prototypes go into a header file with the extern keyword, like this:
funcs.h:
extern void function1(..);
extern void function2(..);
...
main.cpp:
...
#include "funcs.h"
...
& compiling with:
g++ -c function1.cpp
g++ -c function2.cpp
...
g++ -c main.cpp
g++ -o final main.o function1.o function2.o ...
I think that this scheme is better (with a makefile, ofcourse). What reasons can I give my friend to convince him so?
The main reason people compile object by object is to save time. High-level localised code changes often only require compilation of one object and a relink, which can be faster. (Compiling too many objects that draw in heaps of headers, or redundantly instantiate the same templates, may actually be slower when a change in common code triggers a fuller recompilation).
If the project is so small that it can be compiled in 2 seconds, then there's not much actual benefit to the traditional approach, though doing what's expected can save developer time - like yours and ours on here :-). Balancing that, maintaining a makefile takes time too, though you may well end up doing that anyway in order to conveniently capture include directories, libraries, compiler switches etc.
Actual implications to written/generated code:
cpp files normally first include their own headers, which provides a sanity check that the header content can be used independently by other client code: put everything together and the namespace is already "contaminated" with includes from earlier headers/implementation files
the compiler may optimise better when everything is in one translation unit (+1 for leppie's comment, do do the same...)
static non-member variables and anonymous namespaces are private to the translation unit, so including multiple cpps means sharing these around, for better or worse (+1 for Alexander :-))
say a cpp files defines a function or variable which is not mentioned in its header and might even be in an anonymous namespace or static: code later in the translation unit could call it freely without needing to hack up their own forward declaration (this is bad - if the function was intended to be called outside its own cpp then it should have been in the header and an externally exposed symbol in its translation unit's object)
BTW - in C++ your headers can declare functions without explicitly using the extern keyword, and it's normal to do so.
The reason for the second style is because each .cpp file can be treated separately, with its own classes, global variables, ect.. without risk of conflict.
It is also easier in IDEs that automatically link all the .cpp files (like MSVC).