C++ modules to speed up template functions

C++ modules to speed up template functions - c++

In general, using function templates makes the compilation significantly longer.
A friend suggested that I check the modules (C++20) for optimization.
I don't think it will affect compilation speed at all.
I have no idea how to test this, so I'm asking here.
Will the following code somehow magically optimize the build process?
The definition will still have to be created and compiled, so it won't make any difference?
math.ixx:
module;
#include <typeinfo>
export module math;
import <iostream>;
export
template<typename T>
T square(T x) {
std::cout << typeid(T).name() << std::endl;
return x * x;
}
main.cpp
import math;
void main() {
square(int());
square(double());
}

The code example is too trivial for modules to be of any real use. One file which includes a second file, and nothing includes anything else is not a compilation problem. It's like trying to benchmark how fast adding two integer literals is and then making a statement about the quality of C++'s addition operator.
From a performance perspective, modules solves the following problem: they keep the cost of recompiling a single file from being equal to the cost of recompiling every file that first file includes regardless of whether the included files changed.
If you #include <vector> in a simple program, your source file now contains thousands of lines of code. If you change that source file, the compiler will have to recompile thousands of lines of code which did not change. If you have 1000 files that each include <vector>, you now have 1000 identical copies of <vector> which the compiler must compile every time you compile all of those files.
This is the sort of thing that modules prevent. If you import a module for a library, you changing your source file will not necessitate recompiling those included headers. If you import dozens of modules across hundreds or thousands of files, this adds up pretty quickly.
Pre-modules, making a small change to a widely included header prompts a full recompilation of your entire project. In a fully-modularized codebase, there will be a lot of files that get recompiled. But what doesn't happen is that you recompile stuff that didn't rely on the change. You may have changed a widely used header, but you didn't change the C++ standard library. So if you included it via modules, then <vector> and such won't get recompiled.
This is where modules saves performance.

Related

Will modules in c++20 reduce compile time compared to traditional header-files?

Suppose we have the module interface source file foo.ixx in which the module foo is defined. We use
import foo;
in many different cpp-files. Will there be compile time reduction compared to the case where a traditional header-file foo.h is included in many different cpp-files? If the compile time is reduced, why is this the case?

Yes, one of the advantages of modules is that it can reduce compilation times. For comparison, here's how it's done today:
// foo.hpp
// some code
// a.cpp
#include "foo.hpp"
// b.cpp
#include "foo.hpp"
Now when the 2 translation units a.cpp and b.cpp are compiled, some code is textually included into these source files, and hence some code is compiled twice. While the linker will take care that only one definition is actually in the final executable, the compiler still has to compile some code twice, which is wasted effort.
With modules, we would have something like:
// foo.hpp
export module foo;
// some code
// a.cpp
import foo;
// b.cpp
import foo;
Now the compilation process is different; there is an intermediate stage where foo.hpp is compiled into a format that is consumable by a.cpp, and b.cpp, which means that the implementation files do not need to compile some code, they can just use the definitions in some code directly.
This means that the foo.hpp only needs to be compiled once, which can lead to potentially large reductions in compile times, especially as the number of implementation files that consume the module interface unit increases.

"The mechanism for accessing headers from implementation files is to use the include directive from the C preprocessor. In other words, your headers are implicitly copied many times.
There are many copies of all header files scattered across a project, and the compiler has to pass through and parse them over and over again. One of the most visible problems is code compilation times.
Modules effectively replace header files and the preprocessor include directive. The solution proposed by modules suggests that we get rid of textual inclusion with the C preprocessor and, therefore, all of its drawbacks." [Each module handled just once. See Table 2]
Reference

Header include mechanism relies on text preprocessing (essentially similar to interpreted scripting languages); which is both time-consuming and pron to hard to trace errors due to programmer mistakes.
The module import mechanism - on the other hand - is a much smarter mechanism for separating interface and implementation with better guarantee for consistency and correctness. It provides the means to define the interface and implementation in the same translation unit, leading to the possibility of matching the two and informing the lib developer of errors that would've been unnoticed in traditional include system. Thus, not only the build time, but also the entire development cycle is significantly shortened.

c++ class without header

Ok, so I don't have a problem, but a question:
When using c++, you can transfer class to another file and include it without creating header, like this:
foo.cpp :
#include <iostream>
using namespace std;
class foo
{
public:
string str;
foo(string inStr)
{
str = inStr;
}
void print()
{
cout<<str<<endl;
}
};
main.cpp :
#include "foo.cpp"
using namespace std;
int main()
{
foo Foo("That's a string");
Foo.print();
return 0;
}
So the question is: is this method any worse than using header files? It's much easier and much more clean, but is it any slower, any more bug-inducing etc?
I've searched for this topic for a long time now but I haven't seen a single topic on the internet considering this even an option...

So the question is: is this method any worse than using header files?
You might consider reviewing the central idea of what the "C++ translation unit" is.
In your example, what the preprocessor does is as if it inserts a copy of foo.cpp into an internal copy of main.cpp. The preprocessor does this, not the compiler.
So ... the compiler never sees your code when they were separate files. It is this single, concatenated, 'translation unit' that is submitted to the compiler. There is no magic in .hh nor .cc, except that they fulfill your peer's (or boss's) expectations.
Now think about your question ... the translation unit is neither of your source files, nor any of your system include files, but it is one stream of text, one thing, put together by the preprocessor. So how would it be better or worse?
It's much easier and much more clean,
It can be. I often take this 'different' approach in my 'private' coding efforts.
When I did a quick eval of using gmpxx.h (mpz_class) in factorial, I did indeed take just these kinds of shortcuts, and did not need a .hpp file to properly create my compilation unit. FYI - The factorial of 12345, is more than 45,000 bytes. It is pointless to read the chars, too.
A 'more formal' effort (job, cooperation, etc), I always use header's, and separate compilation, and the building a library of functions useful to the app as part of how things should be done. Especially if I might share this code or contribute to a companies archives. There are too many good reasons for me to describe why I recommend you learn these issues.
but is it any slower, any more bug-inducing etc?
I think not. I think not. There is one compilation unit, and concatenating the parts has to be right, but I think is no more difficult.
I've searched for this topic for a long time now but I haven't seen a single
topic on the internet considering this even an option...
I'm not sure I've ever seen it discussed either. I have acquired the information. The separate compilations and library development are generally perceived to save development time. (Time is money, right?)
Also, a library, and header files, are how you package your success for others to use, how you can improve your value to a team.

There's no semantic difference between naming your files .cpp or .hpp (or .c / .h).
People will be surprised by the #include "foo.cpp", the compiler doesn't care

You've still created a "header file", but you've given it the ".cpp" extension. File extensions are for the programmer, the compiler doesn't care.
From the compiler's point of view, there is no difference between your example and
foo.h :
#include <iostream>
using namespace std;
class foo
{
//...
};
main.cpp :
#include "foo.h"
using namespace std;
int main()
{
// ...
}

A "header file" is just a file that you include at the beginning i.e. the head of another file (technically, headers don't need to be at the beginning and sometimes are not but typically they are, hence the name).
You've simply created a header file named foo.cpp.
Naming header files with extension that is conventionally used for source files is not a good idea. Some IDE's and other tools may erroneously assume that your header is a source file, and therefore attempt to compile as if it were such, wasting resources if nothing else.
Not to mention the confusion it may cause in your colleagues. Source files may have definitions that the C++ standard allows to be defined exactly once (see one definition rule, odr) because source files are not included in other files. If you name your header as if it were a source file, someone might assume that they can have odr definitions there when they can't.

If you ever build some larger project, the two main differences will become clear to you:
If you deliver your code as a library to others, you have to give them all your code - all your IP - instead of only the headers of the exposed classes plus a compiled library.
If you change one letter in any file, you will need to recompile everything. Once compile times for a larger project hits minutes, you will lose a lot of productivity.
Otherwise, of course it works, and the result is the same.

Splitting program changes memory?

I want to split up all classes from my program into cpp and hpp files, each file containing few classes from the same topic. Like this:
main.cpp:
#include <cstdio>
using namespace std;
class TopicFoo_Class1 {
... (Functions, variables, public/privates, etc.)
}
class TopicFoo_Class2 {
... (Functions, variables, public/privates, etc.)
}
class TopicBar_Class1 {
... (Stuff)
}
class TopicBar_Class2 {
... (Stuff)
}
int main(int argc, const char** argv) { ... }
into:
foo.hpp:
class TopicFoo_Class1 {
... (Declarations)
}
class TopicFoo_Class2 {
... (Declarations)
}
foo.cpp:
#include <cstdio>
#include "foo.hpp"
void TopicFoo_Class1::function1 { ... }
void TopicFoo_Class2::function1 { ... }
bar.hpp:
class TopicBar_Class1 {
... (Declarations)
}
class TopicBar_Class2 {
... (Declarations)
}
bar.cpp:
#include <cstdio>
#include "bar.hpp"
void TopicBar_Class1::function1 { ... }
void TopicBar_Class2::function1 { ... }
main.cpp:
#include "foo.hpp"
#include "bar.hpp"
int main(int argc, const char** argv) { ... }
The plan is to compile foo.o and bar.o, then compile main.cpp along with the object files to form foo_bar_executable, instead of just compiling a big main.cpp into foo_bar_executable.
This is just an example, header guards and better names will be included.
I'm wondering, will this affect program speed? Some cpps will depend on other topics' hpps to compile, and multiple cpps will depend on one hpp.
Could the multiple includes of the same file by different cpp files cause lag?
Is there a better way to split up my code?
Which one is faster?
Is it possible to run g++ main.cpp foo.cpp bar.cpp -o foo_bar_executable?
How would the above command work?
Should I make foo.hpp contain most required includes and include it in most files? This might make it faster(?)

I'm wondering, will this affect program speed? Some cpps will depend on other topics' hpps to compile, and multiple cpps will depend on one hpp.
You are mixing things that affect the build speed with run-time speed of your executable. The run-time speed shouldn't change. For a small project the difference in build time may be negligible. For larger projects, initial build times may be long, but subsequent ones may become much shorter. The reason is that you only need to rebuild what changed, and re-link.
Could the multiple includes of the same file by different cpp files cause lag?
Including a file always adds some delta to the build time. But it's something you'd need to measure. Nowadays compilers are pretty good with doing that in a smart fashion. If you couple that with smart header specification (no superfluous includes in headers, forward declarations and such), and precompiled headers, you shouldn't see a significant slowdown.
Is there a better way to split up my code?
Depends on the code. It's highly subjective.
Which one is faster?
Measure for yourself, we can't predict it for you.
Is it possible to run g++ main.cpp foo.cpp bar.cpp -o foo_bar_executable?
Last I checked the GCC docs, it was.
How would the above command work?
It will take the above source files and produce a single executable
Should I make foo.hpp contain most required includes and include it in most files? This might make it faster(?)
I wouldn't recommend that. Include the bare minimum to make the single line program #include "foo.hpp" compile successfully. Headers should strive to be minimal and complete (kind of like a certain quality of posts on SO).

m wondering, will this affect program speed?
No.
Could the multiple includes of the same file by different cpp files cause lag?
No.
Which one is faster?
Speed is not really important to most programs, and how you arrange your files has no effect on run-time performance.
Is it possible to run g++ main.cpp foo.cpp bar.cpp -o foo_bar_executable
Yes
How would the above command work?
RTFM
Hey, I'm thirteen and a half!
We don't care.

I'm wondering, will this affect program speed?
It can, but it might not.
When functions are not defined in a single translation unit, the compiler can not optimize the function calls using inline expansion. However, if enabled, some linkers can perform inlining across translation units.
On the other hand, your program might not benefit from inlining optimization.
Some cpps will depend on other topics' hpps to compile, and multiple cpps will depend on one hpp.
This is irrelevant to the speed of the compiled program.
Could the multiple includes of the same file by different cpp files cause lag?
It may have a (possibly insignificant) effect on compilation time from scratch.
Is there a better way to split up my code?
This is subjective. The more you split your code, the less you need to recompile when you make changes. The less you split, the faster it is to compile the entire project from scratch.
Which one is faster?
Possibly neither.
Is it possible to run g++ main.cpp foo.cpp bar.cpp -o foo_bar_executable?
Yes.
How would the above command work?
Use the man g++ command.
Should I make foo.hpp contain most required includes and include it in most files? This might make it faster(?)
No. Including unneeded files slows compilation. Besides this severely reduces the biggest advantage of splitting translation units, which is the lack of needing to compile the entire project when small part changes.

No, it will not affect speed except if you're relying on heavy optimizations, but as a self described "newbie", you likely won't be worrying about this yet. Often in the trade-off between maintaining a code structure to improve optimization vs. improving maintainability, maintenance will usually be the higher priority.
It might make compilation longer, but won't affect the executable. With a proper makefile, you might see compilation actually improve.
It's all subjective. Some packages split up the source per function.
No affect on executable.
Yes, but would recommend learning about makefiles, then you're compiling only what needs to be compiled.
It will compile the files, link to some default libraries, and output the executable. If you're interested in what is happening behind the scenes, compile with verbosity turned on. You can also compile to assembler, which can be really interesting to look at.
Ideally, each source file should include only the headers it needs.

Are there techniques to greatly improve C++ building time for 3D applications?

There are many slim laptops who are just cheap and great to use. Programming has the advantage of being done in any place where there is silence and comfort, since concentrating for long hours is important factor to be able to do effective work.
I'm kinda old fashioned as I like my statically compiled C or C++, and those languages can be pretty long to compile on those power-constrainted laptops, especially C++11 and C++14.
I like to do 3D programming, and the libraries I use can be large and won't be forgiving: bullet physics, Ogre3D, SFML, not to mention the power hunger of modern IDEs.
There are several solutions to make building just faster:
Solution A: Don't use those large libraries, and come up with something lighter on your own to relieve the compiler. Write appropriate makefiles, don't use an IDE.
Solution B: Set up a building server elsewhere, have a makefile set up on an muscled machine, and automatically download the resulting exe. I don't think this is a casual solution, as you have to target your laptop's CPU.
Solution C: use the unofficial C++ module
???
Any other suggestion ?

Compilation speed is something, that can be really boosted, if you know how to. It is always wise to think carefully about project's design (especially in case of large projects, consisted of multiple modules) and modify it, so compiler can produce output efficiently.
1. Precompiled headers.
Precompiled header is a normal header (.h file), that contains the most common declarations, typedefs and includes. During compilation, it is parsed only once - before any other source is compiled. During this process, compiler generates data of some internal (most likely, binary) format, Then, it uses this data to speed up code generation.
This is a sample:
#pragma once
#ifndef __Asx_Core_Prerequisites_H__
#define __Asx_Core_Prerequisites_H__
//Include common headers
#include "BaseConfig.h"
#include "Atomic.h"
#include "Limits.h"
#include "DebugDefs.h"
#include "CommonApi.h"
#include "Algorithms.h"
#include "HashCode.h"
#include "MemoryOverride.h"
#include "Result.h"
#include "ThreadBase.h"
//Others...
namespace Asx
{
//Forward declare common types
class String;
class UnicodeString;
//Declare global constants
enum : Enum
{
ID_Auto = Limits<Enum>::Max_Value,
ID_None = 0
};
enum : Size_t
{
Max_Size = Limits<Size_t>::Max_Value,
Invalid_Position = Limits<Size_t>::Max_Value
};
enum : Uint
{
Timeout_Infinite = Limits<Uint>::Max_Value
};
//Other things...
}
#endif /* __Asx_Core_Prerequisites_H__ */
In project, when PCH is used, every source file usually contains #include to this file (I don't know about others, but in VC++ this actually a requirement - every source attached to project configured for using PCH, must start with: #include PrecompiledHedareName.h). Configuration of precompiled headers is very platform-dependent and beyond the scope of this answer.
Note one important matter: things, that are defined/included in PCH should be changed only when absolutely necessary - every chnge can cause recompilation of whole project (and other depended modules)!
More about PCH:
Wiki
GCC Doc
Microsoft Doc
2. Forward declarations.
When you don't need whole class definition, forward declare it to remove unnecessary dependencies in your code. This also implicates extensive use of pointers and references when possible. Example:
#include "BigDataType.h"
class Sample
{
protected:
BigDataType _data;
};
Do you really need to store _data as value? Why not this way:
class BigDataType; //That's enough, #include not required
class Sample
{
protected:
BigDataType* _data; //So much better now
};
This is especially profitable for large types.
3. Do not overuse templates.
Meta-programming is a very powerful tool in developer's toolbox. But don't try to use them, when they are not necessary.
They are great for things like traits, compile-time evaluation, static reflection and so on. But they introduce a lot of troubles:
Error messages - if you have ever seen errors caused by improper usage of std:: iterators or containers (especially the complex ones, like std::unordered_map), than you know what is this all about.
Readability - complex templates can be very hard to read/modify/maintain.
Quirks - many techniques, templates are used for, are not so well-known, so maintenance of such code can be even harder.
Compile time - the most important for us now:
Remember, if you define function as:
template <class Tx, class Ty>
void sample(const Tx& xv, const Ty& yv)
{
//body
}
it will be compiled for each exclusive combination of Tx and Ty. If such function is used often (and for many such combinations), it can really slow down compilation process. Now imagine, what will happen, if you start to overuse templating for whole classes...
4. Using PIMPL idiom.
This is a very useful technique, that allows us to:
hide implementation details
speed up code generation
easy updates, without breaking client code
How does it work? Consider class, that contain a lot of data (for example, representing person). It could look like this:
class Person
{
protected:
string name;
string surname;
Date birth_date;
Date registration_date;
string email_address;
//and so on...
};
Our application evolves and we need to extend/change Person definition. We add some new fields, remove others... and everything crashes: size of Person changes, names of fields change... cataclysm. In particular, every client code, that depends on Person's definition needs to be changed/updated/fixed. Not good.
But we can do it the smart way - hide the details of Person:
class Person
{
protected:
class Details;
Details* details;
};
Now, we do few nice things:
client cannot create code, that depends on how Person is defined
no recompilation needed as long as we don't modify public interface used by client code
we reduce the compilation time, because definitions of string and Date no longer need to be present (in previous version, we had to include appropriate headers for these types, that adds additional dependencies).
5. #pragma once directive.
Although it may give no speed boost, it is clearer and less error-prone. It is basically the same thing as using include guards:
#ifndef __Asx_Core_Prerequisites_H__
#define __Asx_Core_Prerequisites_H__
//Content
#endif /* __Asx_Core_Prerequisites_H__ */
It prevents from multiple parses of the same file. Although #pragma once is not standard (in fact, no pragma is - pragmas are reserved for compiler-specific directives), it is quite widely supported (examples: VC++, GCC, CLang, ICC) and can be used without worrying - compilers should ignore unknown pragmas (more or less silently).
6. Unnecessary dependencies elimination.
Very important point! When code is being refactored, dependencies often change. For example, if you decide to do some optimizations and use pointers/references instead of values (vide point 2 and 4 of this answer), some includes can become unnecessary. Consider:
#include "Time.h"
#include "Day.h"
#include "Month.h"
#include "Timezone.h"
class Date
{
protected:
Time time;
Day day;
Month month;
Uint16 year;
Timezone tz;
//...
};
This class has been changed to hide implementation details:
//These are no longer required!
//#include "Time.h"
//#include "Day.h"
//#include "Month.h"
//#include "Timezone.h"
class Date
{
protected:
class Details;
Details* details;
//...
};
It is good to track such redundant includes, either using brain, built-in tools (like VS Dependency Visualizer) or external utilities (for example, GraphViz).
Visual Studio has also a very nice option - if you click with RMB on any file, you will see an option 'Generate Graph of include files' - it will generated a nice, readable graph, that can be easily analyzed and used to track unnecessary dependencies.
Sample graph, generated inside my String.h file:

As Mr. Yellow indicated in a comment, one of the best ways to improve compile times is to pay careful attention to your use of header files. In particular:
Use precompiled headers for any header that you don't expect to change including operating system headers, third party library headers, etc.
Reduce the number of headers included from other headers to the minimum necessary.
Determine whether a include is needed in the header or whether it can be moved to cpp file. This sometimes causes a ripple effect because someone else was depending on you to include the header for it, but it is better in the long term to move the include to the place where it's actually needed.
Using forward declared classes, etc. can often eliminate the need to include the header in which that class is declared. Of course, you still need to include the header in the cpp file, but that only happens once, as opposed to happening every time the corresponding header file is included.
Use #pragma once (if it is supported by your compiler) rather than include guard symbols. This means the compiler does not even need to open the header file to discover the include guard. (Of course many modern compilers figure that out for you anyway.)
Once you have your header files under control, check your make files to be sure you no longer have unnecessary dependencies. The goal is to rebuild everything you need to, but no more. Sometimes people err on the side of building too much because that is safer than building too little.

If you've tried all of the above, there's a commercial product that does wonders, assuming you have some available PCs on your LAN. We used to use it at a previous job. It's called Incredibuild (www.incredibuild.com) and it shrunk our build time from over an hour (C++) to about 10 minutes. From their website:
IncrediBuild accelerates build time through efficient parallel computing. By harnessing idle CPU resources on the network, IncrediBuild transforms a network of PCs and servers into a private computing cloud that can best be described as a “virtual supercomputer.” Processes are distributed to remote CPU resources for parallel processing, dramatically shortening build time up by to 90% or more.

Another point that's not mentioned in the other answers: Templates. Templates can be a nice tool, but they have fundamental drawbacks:
The template, and all the templates it depends upon, must be included. Forward declarations don't work.
Template code is frequently compiled several times. In how many .cpp files do you use an std::vector<>? That is how many times your compiler will need to compile it!
(I'm not advocating against the use of std::vector<>, on the contrary you should use it frequently; it's simply an example of a really frequently used template here.)
When you change the implementation of a template, you must recompile everything that uses that template.
With template heavy code, you often have relatively few compilation units, but each of them is huge. Of course, you can go all-template and have only a single .cpp file that pulls in everything. This would avoid multiple compiling of template code, however it renders make useless: any compilation will take as long as a compilation after a clean.
I would recommend going the opposite direction: Avoid template-heavy or template-only libraries, and avoid creating complex templates. The more interdependent your templates become, the more repeated compilation is done, and the more .cpp files need to be rebuilt when you change a template. Ideally any template you have should not make use of any other template (unless that other template is std::vector<>, of course...).

Why is including a header file such an evil thing?

I have seen many explanations on when to use forward declarations over including header files, but few of them go into why it is important to do so. Some of the reasons I have seen include the following:
compilation speed
reducing complexity of header file management
removing cyclic dependencies
Coming from a .net background I find header management frustrating. I have this feeling I need to master forward declarations, but I have been scrapping by on includes so far.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
I can buy the argument for reduced complexity, but what would a practical example of this be?

"to master forward declarations" is not a requirement, it's a useful guideline where possible.
When a header is included, and it pulls in more headers, and yet more, the compiler has to do a lot of work processing a single translation module.
You can see how much, for example, with gcc -E:
A single #include <iostream> gives my g++ 4.5.2 additional 18,560 lines of code to process.
A #include <boost/asio.hpp> adds another 74,906 lines.
A #include <boost/spirit/include/qi.hpp> adds 154,024 lines, that's over 5 MB of code.
This adds up, especially if carelessly included in some file that's included in every file of your project.
Sometimes going over old code and pruning unnecessary includes improves the compilation dramatically just because of that. Replacing includes with forward declarations in the translation modules where only references or pointers to some class are used, improves this even further.

Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
It cannot because, unlike some other languages, C++ has an ambiguous grammar:
int f(X);
Is it a function declaration or a variable definition? To answer this question the compiler must know what does X mean, so X must be declared before that line.

Because when you're doing something like this :
bar.h :
class Bar {
int foo(Foo &);
}
Then the compiler does not need to know how the Foo struct / class is defined ; so importing the header that defines Foo is useless. Moreover, importing the header that defines Foo might also need importing the header that defines some other class that Foo uses ; and this might mean importing the header that defines some other class, etc.... turtles all the way.
In the end, the file that the compiler is working against is almost like the result of copy pasting all the headers ; so it will get big for no good reason, and when someone makes a typo in a header file that you don't need (or import , or something like that), then compiling your class starts to take waaay too much time (or fail for no obvious reason).
So it's a good thing to give as little info as needed to the compiler.

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
1) reduced disk i/o (fewer files to open, fewer times)
2) reduced memory/cpu usage
most translations need only a name. if you use/allocate the object, you'll need its declaration.
this is probably where it will click for you: each file you compile compiles what is visible in its translation.
a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.
that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.
including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).
I can buy the argument for reduced complexity, but what would a practical example of this be?
unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).
Lakos wrote a good book which covers this in detail:
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1

Header file inclusion rules specified in this article will help reduce the effort in managing header files.

I used forward declarations simply to reduce the amount of navigation between source files done. e.g. if module X calls some glue or interface function F in module Y, then using a forward declaration means the writing the function and the call can be done by only visiting 2 places, X.c and Y.c not so much of an issue when a good IDE helps you navigate, but I tend to prefer coding bottom-up creating working code then figuring out how to wrap it rather than through top down interface specification.. as the interfaces themselves evolve it's handy to not have to write them out in full.
In C (or c++ minus classes) it's possible to truly keep structure details Private by only defining them in the source files that use them, and only exposing forward declarations to the outside world - a level of black boxing that requires performance-destroying virtuals in the c++/classes way of doing things. It's also possible to avoid needing to prototype things (visiting the header) by listing 'bottom-up' within the source files (good old static keyword).
The pain of managing headers can sometimes expose how modular your program is or isn't - if its' truly modular, the number of headers you have to visit and the amount of code & datastructures declared within them should be minimized.
Working on a big project with 'everything included everywhere' through precompiled headers won't encourage this real modularity.
module dependancies can correlate with data-flow relating to performance issues, i.e. both i-cache & d-cache issues. If a program involves many modules that call each other & modify data at many random places, it's likely to have poor cache-coherency - the process of optimizing such a program will often involve breaking up passes and adding intermediate data.. often playing havoc with many'class diagrams'/'frameworks' (or at least requiring the creation of many intermediates datastructures). Heavy template use often means complex pointer-chasing cache-destroying data structures. In its optimized state, dependancies & pointer chasing will be reduced.

I believe forward declarations speed up compilation because the header file is ONLY included where it is actually used. This reduces the need to open and close the file once. You are correct that at some point the object referenced will need to be compiled, but if I am only using a pointer to that object in my other .h file, why actually include it? If I tell the compiler I am using a pointer to a class, that's all it needs (as long as I am not calling any methods on that class.)
This is not the end of it. Those .h files include other .h files... So, for a large project, opening, reading, and closing, all the .h files which are included repetitively can become a significant overhead. Even with #IF checks, you still have to open and close them a lot.
We practice this at my source of employment. My boss explained this in a similar way, but I'm sure his explanation was more clear.

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
Because include is a preprocessor thing, which means it is done via brute force when parsing the file. Your object will be compiled once (compiler) then linked (linker) as appropriate later.
In C/C++, when you compile, you've got to remember there is a whole chain of tools involved (preprocessor, compiler, linker plus build management tools like make or Visual Studio, etc...)

Good and evil. The battle continues, but now on the battle field of header files. Header files are a necessity and a feature of the language, but they can create a lot of unnecessary overhead if used in a non optimal way, e.g. not using forward declarations etc.
How do forward declarations speed up
compilations since at some point the
object referenced will need to be
compiled?
I can buy the argument for reduced
complexity, but what would a practical
example of this be?
Forward declarations are bad ass. My experience is that a lot of c++ programmers are not aware of the fact that you don't have to include any header file, unless you actually want to use some type, e.g. you need to have the type defined so the compiler understands what you want to do. It's important to try and refrain from including header files in other header files.
Just passing around a pointer from one function to another, only requires a forward declaration:
// someFile.h
class CSomeClass;
void SomeFunctionUsingSomeClass(CSomeClass* foo);
Including someFile.h does not require you to include the header file of CSomeClass, since you are merely passing a pointer to it, not using the class. This means that the compiler only needs to parse one line (class CSomeClass;) instead of an entire header file (that might be chained to other header files etc etc).
This reduces both compile time and link time, and we are talking big optimizations here if you have many headers and many classes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js