Will modules make template compilation faster? - c++

Will modules make template compilation faster? Templates (usually) have to be header only, and end up residing in the translation unit of the #includer.
Related: Do precompiled headers make template compilation faster?

According to modules proposal, from the very paper you cited, it's the first of the three primary goals for adding modules:
1 Introduction
Modules are a mechanism to package libraries and encapsulate their implementations.
They differ from the traditional approach of translation units and header files primarily in
that all entities are defined in just one place (even classes, templates, etc.). This paper
proposes a module mechanism (somewhat similar to that of Modula-2) with three
primary goals:
Significantly improve build times of large projects
Enable a better separation between interface and implementation
Provide a viable transition path for existing libraries
While these are the driving goals, the proposal also resolves a number of other longstanding practical C++ issues (initialization ordering, run-time performance, etc.).
So, how can they accomplish those goals? Well, from section 4.1:
Since header files are typically included in many other files, the
growth in build cycles is generally superlinear with respect to the total amount of source
code. If the issue is not addressed, it is likely to become worse as the use of templates
increases and more powerful declarative facilities (like concepts, contract programming,
etc.) are added to the language.
Modules address this issue by replacing the textual inclusion mechanism (whose
processing time is roughly proportional to the amount of code included) by a precompiled
module attachment mechanism (whose processing time—when properly implemented—
is roughly proportional to the number of imported declarations). The property that client
translation units need not be recompiled when private module definitions change can be
retained.
In other words, at the very least, the time taken to parse these templates is only done once instead of N times, which is already a huge improvement.
Later sections describe improvements for things like explicit instantiation. The one thing this doesn't directly improve is automatic template instantiation, as section 5.8 acknowledges. Here all that can be guaranteed is exactly the same benefit you already get from precompiled headers: "Both modules Set and Reset must instantiate Lib::S and in fact both expose this instantiation in their interface file." But the proposal then gives some possible technical solutions to the ODR problems, at least some of which also solve the multiple-instantiation problem and may not be possible in today's world. For example, the kind of queried instantiation suggested has been tried repeatedly and it's generally considered too hard to get right with today's model, but modules might make it feasible. There's no proof that it's impossible to get right today, just experience that it's hard, and there's no proof that it would be easier with modules, just the plausible possibility that it might be.
And that fits with a general implication that's never quite stated in the proposal, but is there in the background: Making compilation simpler means we might get new optimizations in the process (directly, because it's easier to reason about what's happening, or indirectly, because more people work on the problem once it's not such a huge pain).
In summary, modules can and will definitely make template compilation faster, if for no other reason than that the template definitions themselves only have to be parsed once. They may allow for other benefits that are either impossible or more difficult to do without modules, but that may not be guaranteeable.

I don't know about modules, but I do know that gcc even now provides precompiled headers, as do many other compilers. A precompiled header can contain a very efficient machine-readable version of a template description, so when that is available upon inclusion of a header, many compiling steps can be skipped which would normally be required for a source-text-only uncompiled header.
The modules paper talks about precompiled interface files, so I assume that current precompiled headers and new precompiled interface files will provide comparable peformance. Creating such a file from a plain text portable module description will probably be more efficient as it can save time due to restrictions of the language syntax. And it will be more standardized, so more headers will get the benefit of precompilation. Current projects seldom precompile more than one header, and cross-project precompiled headers are even rarer in my experience.

Do precompiled headers make template compilation faster?
No; it makes templates not compile. Which is the entire point of both PCHs and modules: to stop compiling everything.
The idea is to turn "load C++ text and compile" into "load C++ symbols." Modules are a generalized form of PCHs.
Now, you still have the cost of instantiating templates (unless they were instantiated within a PCH/module). But the cost of compiling the C++ template code is removed.

Related

C++ header and implementation, (why) is it not automatically handled by the IDE/compiler?

In C++, your classes are often divided into two parts, being the header-file and the actual implementation. In my (unexperienced) opinion, this is awful. It requires me to do all sorts of unnecessary book-keeping, clutters up my project directory and goes against everything I've learned about software development (double implementation). Languages where you only deal with the implementation, such as Java or Python, are much nicer to work with.
I've always learned that the reason to use them was to significantly decrease compilation time. However, wouldn't a modern IDE (CLion in my case) or even the compiler be smart enough to either:
Keep some sort of "shadow"-header file, which would automatically be updated whenever a definition is changed in the implementation?
Automatically split it into the header and implementation during compile time, allowing you to only have to deal with one file? (Something that Lazy C++ seems to do)
Or are there any plugins available that offer this kind of behaviour? C++ modules also seem to offer a solution to this problem, but their current status/support is unclear to me and to make matters worse there seem to be two competing standards (Clang's and Microsoft's).
Unfortunately it is not that simple. Header/source file separation C++ inherited from C due to preprocessor, that both share. Automatic generation of a header file is not possible in general, first of all that separation is not trivial, second header file often has preprocessor code that manually written and generates compilation code. Third almost all templated code goes to a header file due to process of compilation and rules of visibility. Changing all of that would require breaking compatibility with existing code, amount of which is significant and nobody wants to do that. More easy would be to create yet another language (like D) but many people would not want to migrate due to various reasons. We know that committee is working on modules and if they manage to make them work without breaking compatibility, that would be helpful for many of us. But again this is not trivial task at all, the way you describe it would only work in certain environments (when you limit yourself) but cannot be applied to everybody.

Does D allow separation of interface from implementation in templates?

I haven't tried D yet, but it seems like a very interesting language that has found some neat solutions to problems in C++. I'm curious, did it also make it possible to separate interface from implementation in templates? If yes, then how?
no any templates used are fully expanded at compile time
this means that the compiler needs to know the full code of the template making it impossible to keep it out of the .di files
At some point in processing the use of a template, D needs all the information about the template. However, there is no reason that this information need be encode as the original source code (OTOH, as an implementation detail, all current D compiler do require that). This is a fundamental issue of any language that has templates stronger than generics. The implications of this depend on what you are trying to do.
If your interest in separation of interface and implementation is to hide the implementation (like shipping binary libraries and header files in C), then this can't be done. The closest you can get is some kind of code obfuscation system.
If, on the other hand, you are interested in avoiding the cost of reprocessing templates for each recompilation, something more general like a binary pre-compiled header format could allow the reuse of the results of the lexical, syntactic and some of the passes while compiling several other modules. In fact, that would be simpler to do with D than in C.
A third option would be link time code generation, but that has little difference from conventional linking with aggressive use of an anolog to pre-compiled headers.

Effective C++ "35. Minimize compilation dependencies between files". Is it still valid today?

In this chapter Scott Meyer mentioned a few technique to avoid header files dependency. The main goal is to avoid recompiling a cpp file if changes are limited to other included header files.
My questions are:
In my past projects I never paid attention to this rule. The compilation time is not short but it is not intolerable. It could have more to do with the scale (or the lack of) of my projects. How practical is this tip today given the advance in the compiler technology (e.g. clang)?
Where can I find more examples of the use of this techniques? (e.g. Gnome or other OSS projects)
P.S. I am using the 2nd edition.
I don't think compiler technology has advanced particularly. clang is not some piece of magic - if you have dependencies then and you make changes, then dependent code will have to be recompiled. This can take a very, very long time - read hours, or even days for a big project, so people try to minimise such dependencies where possible.
Having said that, it is possible to overdo things - making all classes into PIMPLs, forward declaring everything, etc. Doing this just leads to obfuscated code, and should be avoided whenever possible.
Reducing compilation times is a red herring, and a form of premature optimization. Reorganizing your code to reduce compilation times (when this matters) can be done, but at a somehow great cost.
As for Gnome, Gnome has a "private pointer" in every GObject. This implements the pimpl idiom. This reduces dependencies between source files, and allow for some form of encapsulation. There are fewer compile time problems for C projects.
Modern C++ designs make heavy use of templates, which inevitably make your compilation times skyrocket. Using the pimpl idiom and forward declaring classes (instead of including a header, where possible) reduces the logical dependencies between translation units (this is a good thing), but in many situations do not really help with compilation times.
Using boost greatly increase compilation times (beware if you indirectly include boost headers in many source files), and many C++ projects use it.
I should mention also the thin template idiom is often used to reduce code bloat with templates.

Condensing Declaration and Implementation into an HPP file

I've read a few of the articles about the need / applicability / practicality of keeping headers around in C++ but I can't seem to find anywhere a solid reason why / when the above should or should not be done. I'm aware that boost uses .hpp files to deliver template functions to end users without the need for an associated .cpp file, and this thought is partially sourced off browsing through that code. It seems like this would be a convenient way to deliver single file modules of say a new Wt or Qt widget (still sticking to the one class per .h convention).
However are there any negative technical implementations for giving somebody a single .hpp file with both the header declaration and implementation assuming you have no problem with them having access to the implementation (say in the context of OSS). Does it for instances have any negative implications from the compiler's / linker's perspective?
Any opinions or perspectives on this would be appreciated.
'm aware that boost uses .hpp files to deliver template functions to end users without the need for an associated .cpp file
Wrong verb: it’s not “without the need”, it’s “without the ability”.
If Boost could, they would separate their libraries into headers and implementation files. In fact, they do so where ever possible.
The reason for a clean separation is simple: compilation time for header-only projects increases tremendously because associated header files have to be read, parsed and compiled every time you recompile the tiniest part of your application.
Implementation files only need to be compiled if you happen to recompile that particular object file.
Large C and/or C++ projects take hours to compile. And these use a clean separation into header and object files. If they would only use header files, I’m betting the compilation time would be measured in days instead of hours.
But for many of Boost’s libraries, the fact is that template definitions may not reside in a separate compilation unit than their declarations so this is simply not possible.
The major negative aspect of .hpp-only libraries is that they cannot refer to a precompiled module. All of the code present in the .hpp and hence all of the code in the library must be added to your application. This increases the size of the binary and makes for redundant binaries on such a system that uses the library more than once.
With templates you have no real choice. In theory, export allows you to separate the interface from the implementation, but only one compiler (Comeau) really supports this1, and it's being dropped from C++0x.
In any case, trying to put the implementations of non-template functions into headers leads to one obvious problem: the One Definition Rule remains in effect, so if you define the same function in more than one translation unit, you have a problem. The linker will typically give an error saying the same symbol has been defined more than one.
1Though it's mostly the EDG compiler front-end that really supports it, so other EDG-based compilers, such as Intel's also support export to some degree, though they don't document it, so you can't depend on much with them.

What are the advantages and disadvantages of implementing classes in header files?

I love the concept of DRY (don't repeat yourself [oops]), yet C++'s concept of header files goes against this rule of programming. Is there any drawback to defining a class member entirely in the header? If it's right to do for templates, why not for normal classes? I have some ideas for drawbacks and benefits, but what are yours?
Possible advantages of putting everything in header files:
Less redundancy (which leads to easier changes, easier refactoring, etc.)
May give compiler/linker better opportunities for optimization
Often easier to incorporate into an existing project
Possible disadvantages of putting everything in header files:
Longer compile/link cycles
Loss of separation of interface and implementation
Could lead to hard-to-resolve circular dependencies
Lots of inlining could increase executable size
Prevents binary compatibility of shared libraries/DLLs
Upsets co-workers who prefer the traditional ways of using C++
Well - one problem is that typically implementations change much more often than class definitions - so for a large project you end up having to recompile the world for every small change.
The main reason not to implement a class in the header file is: do the consumers of your class need to know its implementation details? The answer is almost always no. They just want to know what interface they can use to interact with the class. Having the class implementation visible in the header makes it much more difficult to understand what this interface is.
Beyond considerations of compactness and separating interface from implementation, there are also commercial motivations. If you develop a library to sell, you (probably) do not want to give away the implementation details of the library you are selling.
You're not repeating yourself. You only write the code once in one header. It is repeated by the preprocessor, but that's not your problem, and it's not a violation of DRY.
If it's right to do for templates, why not for normal classes
It's not really that it's the right thing to do for templates. It's just the only one that really works in general.
Anyway, if you implement a class in a header, you get the following advantages and disadvantages:
The full implementation is visible anywhere it is used, which makes it easy for the compiler to inline as necessary.
The same code will be parsed and compiled multiple times, leading to higher compile-times.
On the other hand, if everything is in headers, that may lead to fewer translation units, and so the compiler has to run fewer times. Ultimately, you might end up with a single translation unit, which just includes everything once, which can result in very fast compilations.
And... that's it, really.
Most of my code tends to be in headers, but that's because most of my code is templates.
The main disadvantage (apart from the lengthy builds) is there is no clear separation of the interface and implementation.
Ideally, you should not need to see the implementation of an intuitive, and well documented interface.
Not mentionned yet: virtual functions are instantiated for each include, so you can bloat your executable (I'm not sure whether this is true for all compilers).
There is an alternative:
Do a lot of stuff in classes declared in your source-file. 1 example is the pimpl-idiom, but there are also people who are afraid to declare classes out of the header-file. However, this makes sense for private classes.