Where does the C++ compiler start? - c++

If you have a c++ project with several source files and you hit compile, which file does the compiler start with?
I am asking cause I am having some #include-dependency issues on a library.
Compiler would be: VC2003.

It should not be order-dependent. The only relevant steps are:
Each compilation unit includes what it depends on, and should be compilable individually. This means, first, that each CPP file includes all the headers it depends on; and second, that each header should in turn include what it needs so that it can compile even if it is the first one to be compiled.
A link step puts all the compiled object code together and builds the final binary.

It should not matter which file it starts with, the linker resolves external references after all the files have been compiled

Irrelevant. Post the exact issue. The compilation order is non-deterministic and arbitrary, and must have no effect on the compilability of your project.

This depends on the environment. In general a "compiler" only works on a single source file at a time; you use higher-level tools to direct it and compute the proper build order.
Examples of such tools can be make, ant, CMake, SCons, Eclipse, and Visual Studio. A basic check is generally the modification date of the source code files, coupled with built-in and custom rules that define how various output files depend on the inputs.

The order the compiler compiles in shouldn't make a difference, as others have noted.
From the compiler's point of view, when you compile a file with a #include, the included file is inserted into the file being compiled at the point where the #include is, recursing as necessary.

Others have already said that the order shouldn't make a difference.
What you might not have realized is that the compiler compiles every .cpp or .cc file. It does not compile header files. And typically, you only #include header files, and never .cpp files, so the order does not matter. Every .cpp file is processed in isolation. It includes a number of headers, but these are never compiled separately, and it does not typically include other .cpp files either.

The only "include-dependency" problem I can think of is a recursive inclusion. For which the fix normally is guarding it with #ifdef
#ifndef INCLUDED_THEFILENAME_H
#define INCLUDED_THEFILENAME_H
/* content goes here *
#endif
But you better elaborate on the issue you're having.

As others have pointed out, conceptually it is not important which file it starts with. However, it can be useful to start with the most recently edited file (assuming more than one file has been edited) of with the file with the most dependencies. Some environments, such as Code::Blocks, actually allow you to give weightings to source files to give you some control over the compilation order.

A make tool builds a directed acyclic graph of the dependencies specified in the make file. This will normally say the executable depends on a number of object files. The object files depends on source files, each source file depends on headers, and so on.
This produces essentially a multi-way tree. The tree will have the executable as its root, and typically have mostly headers as the leaf nodes (though if you're using some sort of code generator, it could also have the input file for that code generator as a leaf).
It then walks that tree working its way from the leaf nodes to the root node and building as it goes. The answers that have said "it doesn't matter" are basically pointing out that it can pick any branch of that tree to build first. It does matter, however, that when it picks a branch, it builds in the order specified for that branch.

Related

Single header file with all the necessary #include statements

I am currently working on program with a lot of source files. Sometimes it is difficult to keep track of what libraries I have already #included. Theoretically, I could make a single header file called Headers.h that just contains all the #include statements I need, then make all other header files #include "Headers.h".
Why is this a good/bad idea?
Pros:
Slightly less maintenance as you don't have to keep track of which of your files are including headers from which libraries or other compoenents.
Cons:
Definitions in included files might conflict with each other. Especially in C where you don't have namespaces (you tagged with C and C++)
Macros in particular can cause hard to debug problems, where a macro definition unexpectedly conflicts with some name in your file or one of the other included files
Depending on which compiler you use, compilation times might blow out. If using a compiler that pre-compiles headers it might actually reduce compilation time, but if not the opposite will happen
You will often unnecessarily trigger rebuilds of files. If you have your build system set up correctly, then each source file will get rebuilt if any of the included files gets modified. If you always include all headers in your project, then a change to any of your headers will force recompilation of all your source files. Not likely to be an issue for system headers but it will be if you include your own headers in the master file as well.
On the whole I would not recommend that approach. The last con listed above it particularly important.
Best practice would be to include only headers that are needed for the code in each file.
In complement of Harmic's answer, indeed the main issue is the build system (most builders work on file timestamp, not on file contents. omake is a notable exception).
Notice that if you only care about many dependencies, GNU make can be used with autodependencies, together with -M* options passed to GCC (i.e. to g++ and actually to the preprocessor).
However, many libraries are offering to their user a single header (e.g. <gtk/gtk.h>)
Also, a single header file is more friendly to precompiled headers technology. In particular, GCC wants a single header for precompilation.
See also ccache.
Tracking all the required includes would be more difficult as they are abstracted from their c source files and not really supporting modularisation pus all the cons from #harmic

How do compilers know when not to recompile?

How do compilers know when it is not necessary to recompile certain parts of code especially in larger projects?
For example, let's say in C++ we have two C++ files and two header files. The header files depend on one another. (They use the classes specified in each others files.)
Does a compiler always need to parse both header files, (and maybe C++ files for method implementation,) to obtain the class information in order to generate either of the two C++ files?
I always thought that when you run the compiler at the command prompt, it closes immediately after outputting the object files - so it would be impossible to cache the Abstract Syntax Trees or intermediate code. Do most C++ compilers know when a certain file doesn't need to output to an object file, and is therefore skipped?
All of the compilers I know compile every source file they're
told to. Always. And they generate a new version of the object
file for every source file they compile.
Only compiling what is necessary is a job generally left to the
build system (make or other). Knowing which objects need to be
regenerated depend on what each source file includes, directly
or indirectly; most compilers have options to output this
information in some format, either on the fly or as a separate
invocation, and the build systems (the usable ones, at least)
use this information to determine dependencies.
As said above, compilers will compiler every file that it is asked to compile. It is up to tools like make to decide what needs to be compiled.
In make one sets up rules. Each rule has a target, list of dependencies followed by the commands to run if those dependencies are not met. For example
target.o : target.c
gcc -c -o target.o target.c
On most file systems, each file has a timestamp. If target.o has a newer timestamp than target.c (the rule dependency) then make does not run the gcc command below. This is because one firsts edits a source file and then compiles the source file into an object file.
If however the dependent source file is newer than the target, then we know the source file was edited after the compile took place and another compile is in order. make will therefore execute the build command for the rule.
It gets a lot more complex when rules are dependent on other rules but the same principle applies.
I don't know how they (don't) implement it (because many don't... Don't ask me why) but I'm quite sure it would be VERY easy. You save in the intermediate (obj) file the name and the hash of the source file and of every dependent file you are compiling, together with the compilation options that are being used, the hash of the compiler (or its internal version) and the compilation result (ok/error). Next time the user tries to recompile the file, the compiler checks if there is already the intermediate file, checks if all the hashes are the same, if the compilation options are the same and if the compiler is the same... If everything is the same, it gives the pre-saved error message and exits without doing anything.
The intermediate files would be a little bigger (probably some kb each).

how to include certain header files by default so that i don't have to type them in every programs

how to include certain header files by default so that i don't have to type them in every programs:
In dev c++ and code::blocks
Make a global header file that in turn includes whatever files you need in every project, and then you only have to include that single file.
However I would recommend against it, unless all your different project are very similar. Different projects have different needs and also need different header files.
You could issue a compiler directive in your project file or make script to do "per project" includes, but in general I would avoid that.
Source code should be as clear as possible to any reader just by its content. Whenever I have source code that dramatically changes its semantics, eg. by headers that are unknown to me, this can be quite confusing.
On top of that, if you "inject" those headers for certain compilation units that don't need them, that will negatively impact compile time.
As a substitution, what about introducing a common.h/hpp header that includes those certain header files? You can then include your common header in all files that need them and change this common set of headers for all depending files at once. It also opens the door to use precompiled header files, which may be worth a look for you.
From GCC documentation (AFAIK GCC is default compiler used by the development environment you are citing)
-include file
Process file as if #include "file" appeared as the first line of the primary source file. However, the first directory searched for
file is the preprocessor's working directory instead of the directory
containing the main source file. If not found there, it is searched
for in the remainder of the #include "..." search chain as normal.
If multiple -include options are given, the files are included in the order they appear on the command line.
-imacros file
Exactly like -include, except that any output produced by scanning file is thrown away. Macros it defines remain defined. This allows you
to acquire all the macros from a header without also processing its
declarations.
All files specified by -imacros are processed before all files specified by -include.
But it is usually a bad idea to use these.
Dev c++ works with MingW compiler, which is gcc compiler for Windows. Gcc supports precompiled headers, so you can try that. Precompiled headers are header files that you want compiled and added to every object file in a project. Try searching for that in Google for some information.
Code::blocks supports them too, when used with gcc, even better, so there it may even be easier.
If your editor of choice supports macros, make one that adds your preferred set of include files. Once made, all you have to do is invoke your macro to save yourself the repetitive typing and you're golden.
Hope this helps.

Difference between C++ files

I just started a graphical C++ course and I have problem getting an overview how it is.
we got some starting code, two files; one of type "C++ Source" and another of "C/C++ Header".
its supposed to be a graphical program which fills the screen with color.
also, we are using some custom libraries such as SDL and GLM, in the same folder as those two files there is a folder named gml and loads of subfolders, which I wont get into.
I have downloaded mingw, cmake and Visual Studio 11 beta for c++.
I've tried making a normal Win32 program and also a forms-application for the graphical part, but its always something wrong when compiling.
My question: how are you supposed to handle C++ files? I just got used to java and there its so easy to just open the .java file and paste into your IDE, dealing with C++ makes me really confused.
Hmm... Where to begin...
Somethings that happen behind the scenes in other languages are much more visible in C++. The process of obtaining a binary (say, an executable) from C++ involves first compiling the source code (There are sub-steps of this but the compiler handles them) to obtain object files, then the object files are linked by the linker to generate a binary.
In theory, you could simply #include all the cpp files in a project, and compile them all together and "link" (although there's nothing to link) but that would take a very long time, and more importantly, in complex projects that could deplete the memory available to your compiler.
So, we split our projects into compilation units, and by convention a .cpp file represents a single compilation unit. A compilation unit is the part of your project that gets compiled to generate one object file. Even though compilation units are compiled separately, some code has to be common among them, so that the piece of code in each of them can use the functionalities implemented by the others. .h files conventionally serve this purpose. Things are basically declared (sort of announced) in them, so that each compilation unit knows what to expect when it's a part of a linking process to generate a binary.
There's also the issue with libraries. You can find mainly two kinds of things in libraries;
Already implemented functionality, shipped to you in the form of binary files including CPU instructions that can almost be run (but they've to be inserted in the right place). This form is accompanied by .h files to let your .cpp files know what to expect in the library.
The second type is functionality implemented directly in the .h
files. Yes, this is possible under special cases. There are cases,
where the implementation has to (a weak has to) accompany the
declaration (inlined functions, templated types etc.).
The first type comes in two flavors: A "static library" (.lib in windows, .a in linux), that enters your executable and becomes a part of it during linking, and a "dynamic library", that is exposed to your binary (so it knows about it) but that doesn't become a part of it. So, your executable will be looking for that dynamic library (.dll files in windows and .so files in linux f.x.) while it's run.
So, in order for your .cpp files to be able to receive services from libraries, they have to #include their .h files, to know about what there is in them. Later on, during linking, you have to show the linker where (what path in the file system) to find the binary components of those libraries. Finally, if the library is dynamic, the .dll's (or .so's etc.) must be accessible during run time (keep them in the same folder for instance).
While compiling your compilation units you have to tell the compiler where to find the .h files. Otherwise, all it will see will be #include <something.h> and it won't know where to find that file. with gcc, you tell the compiler with the -I option. Note that, you just tell the folder. Also of importance is that if the include directive looks like #include<somefolder/somefile.h> you shouldn't include somefolder in the path. So the invocation looks like:
g++ mycompilationunit.cpp -IPATH/TO/THE/INCLUDED/FILES -IPATH/TO/OTHER/INCLUDED/FILES -c
The -c option tells the compiler that it shouldn't attempt to make an executable just from this compilation unit, so it creates a .o file, to be linked with others later. Since we don't tell it the output file name, it spits out mycompilationunit.o.
Now we want to generate our binary (you probably want an executable, but you could also want to create a library of yours). So we have to tell the linker everything that goes into the binary. All the object files and all the static and dynamic libraries. So, we say: (Note g++ here also acts as the linker)
g++ objectfile1.o objectfile2.o objectfile3.o -LPATH/TO/LIBRARY/BINARIES -llibrary1 -llibrary2 -o myexecutable
Here, -L option is self explanatory in the example. -l option tells which binaries to look for. The linker will accept both static and dynamic libraries if it finds them on the path, and if it finds both, it'll choose one. Note that what goes after -l is not the full binary name. For instance in linux library names take the form liblibrary.so.0 but they're referred to as -llibrary in the linker command. finally -o tells the compiler what name to give to your executable. You need some other options to f.x. create a dynamic library, but you probably don't need to know about them now.
What is the difference between a .cpp file and a .h file?
Look at this answer. Also a quick google search explains a bit too.
Pretty much .h (header) files are declerations and .cpp (source) files are definitions. It is possible to combine both files into one .cpp file but as projects get bigger and bigger its becomes annoying and almost unreasonable.
Hope that helps.
In C++ there is a notion of a function declaration (the function signature) and a function definition (the actual code).
A header file (*.h) contains the declarations of functions and classes. A source file (*.cpp, *.c++, *.C) contains the definitions.
A header file can be included in a source file using #include directive.
When you define a class in C++, you typically only include the declarations of the member functions (methods in Java lingo), and you put the class definition into a header file. The member function definitions containing the body of each function are typically put outside the class definition and into the source file.
Generally the best thing to do here is to get a book on C++ or C, and to look at some sample code.
Header files (.h) are supposed to contain definitions of classes, methods, and variables. Source file (.cpp) will contain the code. So in your .cpp file you need to include the header file as #include "header-file-name.h".
Then use g++ to compile the .cpp file. Make sure that the path to .h file is correct.
If you are using CodeBlocks or Visual Studio, then just compiling the project and running will do everything for you. You can also add .h or .cpp file from there. You need not worry about anything.
Hope this helps.

Combining C++ header files

Is there an automated way to take a large amount of C++ header files and combine them in a single one?
This operation must, of course, concatenate the files in the right order so that no types, etc. are defined before they are used in upcoming classes and functions.
Basically, I'm looking for something that allows me to distribute my library in two files (libfoo.h, libfoo.a), instead of the current bunch of include files + the binary library.
As your comment says:
.. I want to make it easier for library users, so they can just do one single #include and have it all.
Then you could just spend some time, including all your headers in a "wrapper" header, in the right order. 50 headers are not that much. Just do something like:
// libfoo.h
#include "header1.h"
#include "header2.h"
// ..
#include "headerN.h"
This will not take that much time, if you do this manually.
Also, adding new headers later - a matter of seconds, to add them in this "wrapper header".
In my opinion, this is the most simple, clean and working solution.
A little bit late, but here it is. I just recently stumbled into this same problem myself and coded this solution: https://github.com/rpvelloso/oneheader
How does it works?
Your project's folder is scanned for C/C++ headers and a list of headers found is created;
For every header in the list it analyzes its #include directives and assemble a dependency graph in the following way:
If the included header is not located inside the project's folder then it is ignored (e.g., if it is a system header);
If the included header is located inside the project's folder then an edge is create in the dependency graph, linking the included header to the current header being analyzed;
The dependency graph is topologically sorted to determine the correct order to concatenate the headers into a single file. If a cycle is found in the graph, the process is interrupted (i.e., if it is not a DAG);
Limitations:
It currently only detects single line #include directives (e.g., #include );
It does not handles headers with the same name in different paths;
It only gives you a correct order to combine all the headers, you still need to concatenate them (maybe you want remove or modify some of them prior to merging).
Compiling:
g++ -Wall -ggdb -std=c++1y -lstdc++fs oneheader.cpp -o oneheader[.exe]
Usage:
./oneheader[.exe] project_folder/ > file_sequence.txt
(Adapting an answer to my dupe question:)
There are several other libraries which aim for a single-header form of distribution, but are developed using multiple files; and they too need such a mechanism. For some (most?) it is opaque and not part of the distributed code. Luckily, there is at least one exception: Lyra, a command-line argument parsing library; it uses a Python-based include file fuser/joiner script, which you can find here.
The script is not well-documented, but they way you use it is with 3 command-line arguments:
--src-include - The include file to convert, i.e. to merge its include directives into its body. In your case it's libfoo.h which includes the other files.
--dst-include - The output file to write - the result of the merging.
--src-include-dir - The directory relative to which include files are specified (i.e. an "include search path" of one directory; the script doesn't support the complex mechanism of multiple include paths and search priorities which the C++ compiler offers)
The script acts recursively, so if file1.h includes another file under the --src-include-dir, that should be merged in as well.
Now, I could nitpick at the code of that script, but - hey, it works and it's FOSS - distributed with the Boost license.
If your library is so big that you cannot build and maintain a single wrapping header file like Kiril suggested, this may mean that it is not architectured well enough.
So if your library is really huge (above a million lines of source code), you might consider automating that, with tools like
GCC make dependency generator preprocessor options like -M -MD -MF etc, with another hand made script sorting them
expensive commercial static analysis tools like coverity
customizing a compiler thru plugins or (for GCC 4.6) MELT extensions
But I don't understand why you want an automated way of doing this. If the library is of reasonable size, you should understand it and be able to write and maintain a wrapping header by hand. Automating that task will take you some efforts (probably weeks, not minutes) so is worthwhile only for very large libraries.
If you have a master include file that includes all others available, you could simply hack a C preprocessor re-implementation in Perl. Process only ""-style includes and recursively paste the contents of these files. Should be a twenty-liner.
If not, you have to write one up yourself or try at random. Automatic dependency tracking in C++ is hard. Like in "let's see if this template instantiation causes an implicit instantiation of the argument class" hard. The only automated way I see is to shuffle your include files into a random order, see if the whole bunch compiles, and re-shuffle them until it compiles. Which will take n! time, you might be better off writing that include file by hand.
While the first variant is easy enough to hack, I doubt the sensibility of this hack, because you want to distribute on a package level (source tarball, deb package, Windows installer) instead of a file level.
You really need a build script to generate this as you work, and a preprocessor flag to disable use of the amalgamate (that could be for your uses).
To simplify this script/program, it helps to have your header structures and include hygiene in top form.
Your program/script will need to know your discovery paths (hint: minimise the count of search paths to one if possible).
Run the script or program (which you create) to replace include directives with header file contents.
Assuming your headers are all guarded as is typical, you can keep track of what files you have already physically included and perform no action if there is another request to include them. If a header is not found, leave it as-is (as an include directive) -- this is required for system/third party headers -- unless you use a separate header for external includes (which is not at all a bad idea).
It's good to have a build phase/translation that includes header alone and produces zero warnings or errors (warnings as errors).
Alternatively, you can create a special distribution repository so they never need to do more than pull from it occasionally.
What you want to do sounds "javascriptish" to me :-) . But if you insist, there is always "cat" (or the equivalent in Windows):
$ cat file1.h file2.h file3.h > my_big_file.h
Or if you are using gcc, create a file my_decent_lib_header.h with the following contents:
#include "file1.h"
#include "file2.h"
#include "file3.h"
and then use
$ gcc -C -E my_decent_lib_header.h -o my_big_file.h
and this way you even get file/line directives that will refer to the original files (although that can be disabled, if you wish).
As for how automatic is this for your file order, well, it is not at all; you have to decide the order yourself. In fact, I would be surprised to hear that a tool that orders header dependencies correctly in all cases for C/C++ can be built.
usually you don't want to include every bit of information from all your headers into the special header that enables the potential user to actually use your library. The non-trivial removal of type definitions, further includes or defines, that are not necessary for the user of your interface to know can not be automatedly done. As far as I know.
Short answer to your main question:
No.
My suggestions:
manually make a new header, that contains all relevant information (nothing more, nothing less) for the user of your library interface. Add nice documentation comments for each component it contains.
use forward declarations where possible, instead of full-fledged included definitions. Put the actual includes in your implementation files. The less include statements you have in your headers, the better.
don't build a deeply nested hierarchy of includes. This makes it extremely hard to keep an overview on the contents of every bit you include. The user of your library will look into the header to learn how to use it. And he will probably not be able to distinguish relevant code from irrelevant on the first sight. You want to maximize the ratio of relevant code per total code in the main header for your library.
EDIT
If you really do have a toolkit library, and the order of inclusion really does not matter, and you have a bunch of independent headers, that you want to enumerate just for convenience into a single header, then you can use a simple script. Like the following Python (untested):
import glob
with open("convenience_header.h", 'w') as f:
for header in glob.glob("*.h"):
f.write("#include \"%s\"\n" % header)