C++ modules: module implementation units for unnecessary recompilation? - c++

Recently watched video from CppCon 2017: Boris Kolpackov “Building C++ Modules”
https://www.youtube.com/watch?v=E8EbDcLQAoc
Approximately at 31:35 he starts explaining that we should still use header/source splitting and shows 3 reasons. The first reason:
If you have both declarations/definitions in the same place when you touch this module all other modules that depend on the module interface (BMI) will be recompiled.
And that I didn't like at all. It sounded like we are still in 90s and compilers cannot be smart enough to see difference in BMI-related changes and implementation related changes. As I see it, compilers are able to quickly scan each module and generate only BMI from it. And if BMI is not changed - don't recompile other modules that depend on it.
Or am I missing something?

The author of that talk later said the recompilation issue is a matter of implementation. Quoting the article Common C++ Modules TS Misconceptions by Boris Kolpackov:
It turns out a lot of people would like to get rid of the header/source split (or, in modules terms, interface/implementation split) and keep everything in a single file. You can do that in Modules TS: with modules (unlike headers) you can define non-inline functions and variables in module interface units. So if you want to keep everything in a single file, you can.
and
Now, keeping everything in a single file may have negative build performance implications but it appears a smart enough build system in cooperation with the compiler should be able to overcome this. See this discussion for details.
Quoting Gor Nishanov (the Project Editor of Coroutines TS) from the linked thread:
That is up to you how to structure your code. Module TS does not impose on you how you decompose your module into individual files. If you want, you can implement your entire module in the interface file (thus you will have C# like experience), or you can partition your module into an interface and one or more implementation files.
The Project Editor of Modules TS, Gabriel Dos Reis, commented on the MSVC implementation:
Ideally, only semantics-relevant changes should trigger recompilation keyed on the IFC.
(As a side note, the Modules TS has now been approved and sent to ISO for publication.)

Related

Why put function declarations and definitions in separate files?

In codeacademy's lessons on functions, they teach you to use three files if you are going to call functions in your program:
the int main() file, which i've found through trial and error is an indispensable part of the program part of a c++ program (i guess...), with a .cpp file extension
a header file for DECLARING functions, with a .hpp file extension.
a separate file with function DEFINITIONS, with a .cpp file extension
Would it work to both declare and define functions within the header file by itself and simply include them above int main()? To me having seperate files for declarations and definitions just seems like it would confuse matters in a larger project.
In a large project you often only need type and function declarations, not the definitions. For example in other header files. If all definitions were in the headers, then the combined result of including multiple other headers and their transitive includes would lead to huge compilation units. This significantly hurts compile times since the amount of code the compiler needs to process would explode to orders of magnitudes more than needed. It would also hurt link times since the linker would have more work to do in discarding duplicates included in many more object files.
You also easily run into ODR (One Definition Rule) issues unless everything is marked inline.
In large projects, function declarations may be needed by many files, but the function definition should only be compiled once. It is combined with all the places that need it at link time.
A small C++ program can be (and often is made of) a single translation unit of e.g. a few thousand lines of C++ code. In that case, you could have a single myprog.cc C++ source files (with several #include-s inside).
But when you work on a larger program, in teams, it is convenient to have several C++ source files.
Some C++ files are generated by another program (this is called metaprogramming or source to source compilation) and could have a million lines of C++ lines. ANTLR or GNU Bison or TypeScript2Cxx are capable of generating C++ code.
But if you work in a team of people like Alice and Bob, it is convenient to decide that Alice is responsible of alice.cc and Bob is writing bob.cc, and both cooperate on a common header file header.hh which is #include-d in both alice.cc and bob.cc. That header.hh would practically define the API of the software project.
Read more about version control systems (I prefer git) and build automation tools (such as ninja or make).
Look for inspiration inside the C++ code of existing open source projects on gitlab or github or elsewhere (in particular, inside the source code of Clang and of GCC, both being major C++ compilers).
FWIW, in GCC 10.1 (of may, 2020) the gcc/go/gofrontend/expressions.cc file is handwritten and has 19711 lines of C++ code, so nearly twenty thousands lines. They are compiled daily. I do know the people working on that, they are brilliant and nice professionals. The biggest file of FTLK 1.4 is its src/Fl_Text_Display.cxx with 4175 C++ lines.
By personal experience, you might have a single C++ function of several dozen thousands lines of C++ (this makes practical sense only when that C++ code is generated), but then the compilation time by an optimizing compiler is dissuasive. You could adapt my manydl.c program to generate C++ files (it currently generates "random" C files with functions of "tunable" size) of arbitrary size. But C++ code generated by Fluid or Qt Designer might be quite large, and C++ code generated for GUI is often made of long but conceptually simple functions.
Nothing in the C++11 standard (see n3337) requires several translation units. You might have (see sqlite for an example) a single C++ file foo.cc of a million lines. And you could generate some of the C++ source code. The Qt project, the GCC compiler. Jacques Pitrat's book on Artificial Beings: the conscience of a conscious machine ISBN 978-1848211018 explain in many pages why such an approach is worthwhile.
There's kind of two answers to this question
why would YOU the programmer want to split things out into multiple .h/.hpp and .cpp files?
I believe the answer here is it can help organization when your .cpp files become very large with a lot of code that may not be relevant to someone who needs to provide the functionality provided by the file. Here's an example:
Let's say you have some c++ code that that display images on a screen. You as the person who wants to use that code are likely interested in the functions/classes exposed by that code which let you control that functionality. Maybe the code exposes the following helpful functions:
WriteImageToScreen(int position_x, int position_y)
ClearScreen()
It can be much easier to look through a header file which only tells you what you are allowed to use rather than how all of that is implemented. It's very possible that implementing those two functions so that you can call them requires 1000s of lines of code and a bunch of variables and statements you don't care about. Not having to read that helps you focus on the important part of the code. The pieces you want to interact with.
I've presented this example as if you were calling someone else's code but the same applies for your own code. As your projects get bigger, it can be convenient to have summaries of what each functionality a file exposes.
Now that all being said, not everyone agrees this is the correct way to do things or that it is helpful.
why is it necessary for the compiler for things to be split out into multiple .h/.hpp and .cpp files?
Just in case you aren't familiar with the term, a compiler is the program which turns your source code text into a program which your computer can execute.
So why does the compiler need separate .hpp/.cpp files? Others have pretty much already hit the nail on the head with this one, but c++ compilers get confused if something is defined multiple times. If you put everything in a header file, then when you include that header in multiple files it will be defined multiple times. So in essence this circles back to the organizational question.
I have seen programmers who just have a single main file and then all code is included directly to that main file during compilation
#include "SomeFile.cpp"
#include "AnotherFile.cpp"
// ...
#include "SoManyFiles.cpp"
int main()
{
DoStuff();
}
I believe this is called a monolith build and it's not recommended.
If you have a toy project, you can.
If you have a 1,000,000 lines of code, your build times will be horrid.
C++20 introduces modules, which should make the whole issue go away.
Other languages have tools that can extract an interface from a "module".
Hopefully, when C++20 arrives, tools will become available.
The only good reason to split an interface from an implementation is if there
are multiple implementations for one interface. E.g. VHDL, and will be available in C++20 modules. The pragmatic reasons are compilation speed and legibility.

How should I write my C++ to be prepared for C++ modules?

There are already two compilers that support C++ modules:
Clang: http://clang.llvm.org/docs/Modules.html
MS VS 2015: http://blogs.msdn.com/b/vcblog/archive/2015/12/03/c-modules-in-vs-2015-update-1.aspx
When starting a new project now, what should I pay attention to in order to be able to adopt the modules feature when it is eventually released in my compiler?
Is it possible to use modules and still maintain compatibility with older compilers that do not support it?
There are already two compilers that support C++ modules
clang: http://clang.llvm.org/docs/Modules.html
MS VS 2015: http://blogs.msdn.com/b/vcblog/archive/2015/12/03/c-modules-in-vs-2015-update-1.aspx
The Microsoft approach appears to be the one gaining the most traction, mainly because Microsoft are throwing a lot more resources at their implementation than any of the clang folk currently. See https://llvm.org/bugs/buglist.cgi?list_id=100798&query_format=advanced&component=Modules&product=clang for what I mean, there are some big showstopper bugs in Modules for C++, whereas Modules for C or especially Objective C look much more usable in real world code. Visual Studio's biggest and most important customer, Microsoft, is pushing hard for Modules because it solves a whole ton of internal build scalability problems, and Microsoft's internal code is some of the hardest C++ to compile anywhere in existence so you can't throw any compiler other than MSVC at it (e.g. good luck getting clang or GCC to compile 40k line functions). Therefore the clang build tricks used by Google etc aren't available to Microsoft, and they have a huge pressing need to get it fixed sooner rather than later.
This isn't to say there aren't some serious design flaws with the Microsoft proposal when applied in practice to large real world code bases. However Gaby is of the view you should refactor your code for Modules, and whilst I disagree, I can see where he is coming from.
When starting a new project now, what should I pay attention to in order to be able to adopt the modules feature when it is eventually released in my compiler?
In so far as Microsoft's compiler is currently expected to implement Modules, you ought to make sure your library is usable in all of these forms:
Dynamic library
Static library
Header only library
Something very surprising to many people is that C++ Modules as currently expected to be implemented keeps those distinctions, so now you get a C++ Module variant for all three of the above, with the first most looking like what people expect a C++ Module to be, and the last looking most like a more useful precompiled header. The reason you ought to support those variants is because you can reuse most of the same preprocessor machinery to also support C++ Modules with very little extra work.
A later Visual Studio will allow linking of the module definition file (the .ifc file) as a resource into DLLs. This will finally eliminate the need for the .lib and .dll distinction on MSVC, you just supply a single DLL to the compiler and it all "just works" on module import, no headers or anything else needed. This of course smells a bit like COM, but without most of the benefits of COM.
Is it possible to use modules in a single codebase and still maintain compatibility with older compilers that do not support it?
I'm going to assume you meant the bold text inserted above.
The answer is generally yes with even more preprocessor macro fun. #include <someheader> can turn into an import someheader within the header because the preprocessor still works as usual. You can therefore mark up individual library headers with C++ Modules support along something like these lines:
// someheader.hpp
#if MODULES_ENABLED
# ifndef EXPORTING_MODULE
import someheader; // Bring in the precompiled module from the database
// Do NOT set NEED_DEFINE so this include exits out doing nothing more
# else
// We are at the generating the module stage, so mark up the namespace for export
# define SOMEHEADER_DECL export
# define NEED_DEFINE
# endif
#else
// Modules are not turned on, so declare everything inline as per the old way
# define SOMEHEADER_DECL
# define NEED_DEFINE
#endif
#ifdef NEED_DEFINE
SOMEHEADER_DECL namespace someheader
{
// usual classes and decls here
}
#endif
Now in your main.cpp or whatever, you simply do:
#include "someheader.hpp"
... and if the compiler had /experimental:modules /DMODULES_ENABLED then your application automagically uses the C++ Modules edition of your library. If it doesn't, you get inline inclusion as we've always done.
I reckon these are the minimum possible set of changes to your source code to make your code Modules-ready now. You will note I have said nothing about build systems, this is because I am still debugging the cmake tooling I've written to get all this stuff to "just work" seamlessly and I expect to be debugging it for some months yet. Expect to see it maybe at a C++ conference next year or the year after :)
Is it possible to use modules and still maintain compatibility with older compilers that do not support it?
No, it is not possible. It might be possible using some #ifdef magic like this:
#ifdef CXX17_MODULES
...
#else
#pragma once, #include "..." etc.
#endif
but this means you still need to provide .h support and thus lose all the benefits, plus your codebase looks quite ugly now.
If you do want to follow this approach, the easiest way to detect "CXX17_MODULES" which I just made up is to compile a small test program that uses modules with a build system of your choice, and define a global for everyone to see telling whether the compilation succeeded or not.
When starting a new project now, what should I pay attention to in order to be able to adopt the modules feature when it is eventually released in my compiler?
It depends. If your project is enterprise and gets you food on the plate, I'd wait a few years once it gets released in stables so that it becomes widely adapted. On the other hand, if your project can afford to be bleeding-edge, by all means, use modules.
Basically, it's the same story ast with Python3 and Python2, or less relevantly, PHP7 and PHP5. You need to find a balance between being a good up-to-date programmer and not annoying people on Debian ;-)

Will modules make template compilation faster?

Will modules make template compilation faster? Templates (usually) have to be header only, and end up residing in the translation unit of the #includer.
Related: Do precompiled headers make template compilation faster?
According to modules proposal, from the very paper you cited, it's the first of the three primary goals for adding modules:
1 Introduction
Modules are a mechanism to package libraries and encapsulate their implementations.
They differ from the traditional approach of translation units and header files primarily in
that all entities are defined in just one place (even classes, templates, etc.). This paper
proposes a module mechanism (somewhat similar to that of Modula-2) with three
primary goals:
Significantly improve build times of large projects
Enable a better separation between interface and implementation
Provide a viable transition path for existing libraries
While these are the driving goals, the proposal also resolves a number of other longstanding practical C++ issues (initialization ordering, run-time performance, etc.).
So, how can they accomplish those goals? Well, from section 4.1:
Since header files are typically included in many other files, the
growth in build cycles is generally superlinear with respect to the total amount of source
code. If the issue is not addressed, it is likely to become worse as the use of templates
increases and more powerful declarative facilities (like concepts, contract programming,
etc.) are added to the language.
Modules address this issue by replacing the textual inclusion mechanism (whose
processing time is roughly proportional to the amount of code included) by a precompiled
module attachment mechanism (whose processing time—when properly implemented—
is roughly proportional to the number of imported declarations). The property that client
translation units need not be recompiled when private module definitions change can be
retained.
In other words, at the very least, the time taken to parse these templates is only done once instead of N times, which is already a huge improvement.
Later sections describe improvements for things like explicit instantiation. The one thing this doesn't directly improve is automatic template instantiation, as section 5.8 acknowledges. Here all that can be guaranteed is exactly the same benefit you already get from precompiled headers: "Both modules Set and Reset must instantiate Lib::S and in fact both expose this instantiation in their interface file." But the proposal then gives some possible technical solutions to the ODR problems, at least some of which also solve the multiple-instantiation problem and may not be possible in today's world. For example, the kind of queried instantiation suggested has been tried repeatedly and it's generally considered too hard to get right with today's model, but modules might make it feasible. There's no proof that it's impossible to get right today, just experience that it's hard, and there's no proof that it would be easier with modules, just the plausible possibility that it might be.
And that fits with a general implication that's never quite stated in the proposal, but is there in the background: Making compilation simpler means we might get new optimizations in the process (directly, because it's easier to reason about what's happening, or indirectly, because more people work on the problem once it's not such a huge pain).
In summary, modules can and will definitely make template compilation faster, if for no other reason than that the template definitions themselves only have to be parsed once. They may allow for other benefits that are either impossible or more difficult to do without modules, but that may not be guaranteeable.
I don't know about modules, but I do know that gcc even now provides precompiled headers, as do many other compilers. A precompiled header can contain a very efficient machine-readable version of a template description, so when that is available upon inclusion of a header, many compiling steps can be skipped which would normally be required for a source-text-only uncompiled header.
The modules paper talks about precompiled interface files, so I assume that current precompiled headers and new precompiled interface files will provide comparable peformance. Creating such a file from a plain text portable module description will probably be more efficient as it can save time due to restrictions of the language syntax. And it will be more standardized, so more headers will get the benefit of precompilation. Current projects seldom precompile more than one header, and cross-project precompiled headers are even rarer in my experience.
Do precompiled headers make template compilation faster?
No; it makes templates not compile. Which is the entire point of both PCHs and modules: to stop compiling everything.
The idea is to turn "load C++ text and compile" into "load C++ symbols." Modules are a generalized form of PCHs.
Now, you still have the cost of instantiating templates (unless they were instantiated within a PCH/module). But the cost of compiling the C++ template code is removed.

what is c++ modules and how do they differ from namespaces?

I was looking at libstdc++ documentation at http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01618.html and found it arranged in "modules" such as Algorithm, Strings etc
I have multiple questions
Since this is auto-generated documentation from doxygen, which part of libstdc++ source code or config file, makes doxygen "aware" of different modules and their contents/dependencies?
What is modules and how does it differ from namespace.
I did a google search on c++ modules and found that modules are defined by "export modulename", but i could not find any export definition in libstdc++ source code. Does the word "Modules" in the above documentation refer to some different construct than export ?
Do developers typically divide their source code into modules for large projects?
where can i learn about modules, so that i can organize my source code into modules and namespaces
It looks to me like you're running into two entirely separate things that happen to use the same name.
The "modules" you're seeing in the documentation seem to be just a post-hoc classification of the algorithms and such. It may be open to argument that they should correspond closely to namespaces, but in the case of the standard library, essentially everything is in one giant namespace. If it were being designed using namespaces from the beginning it might not be that way, but that's not how things happened. In any case, the classification applies to the documentation, not to the code itself. Somebody else producing similar documentation might decide to divide it up into different modules, still without affecting the code.
During the C++11 standardization effort, there was a proposal to add something else (that also went by the name modules) to the C++ language itself. This proposal was removed, primarily in the interest of finishing the standard sooner. The latter differed from namespaces quite a bit, and is the one that used "export" for a module name. It's dead and gone (a least for now) though, so I won't go into a lot more detail about it here. If you're curious, you can read Daveed Vandervoorde's paper about it though.
Update: The committee added modules to C++ 20. What was added is at least somewhat different from anything anybody would have known about in 2012 when this question was asked, but it is at least pretty much the same general idea as the modules that were proposed for C++11. A bit much to add on to a 10 year-old answer, but here's a link to at least some information about them:
https://en.cppreference.com/w/cpp/language/modules
The modules you see in the documentation are created by Doxygen and are not a part of C++. Certain classes in libstdc++ library are grouped together into modules using the \ingroup Doxygen command.
See: http://www.doxygen.nl/manual/grouping.html for more information on creating modules/groups in Doxygen.

What's the rationale behind headers?

I don't quite understand the point of having a header; it seems to violate the DRY principle! All the information in a header is (can be) contained in the implementation.
It simplifies the compilation process. When you want to compile units independently, you need something to describe the parts that will be linked to without having to import the entirety of all the other files.
It also allows for code hiding. One can distribute a header to allow others to use the functionality without having to distribute the implementation.
Finally, it can encourage the separation of interface from implementation.
They are not the only way to solve these problems, but 30 years ago they were a good one. We probably wouldn't use header files for a language today, but they weren't invented in 2009.
The architects of many modern languages such as Java, Eiffel and C# clearly agree with you -- those languages extract the metadata about a module from the implementation. However, per se, the concept of headers doesn't preclude that -- it would obviously be a simple task for a compiler to extract a .h file while compiling a .c, for example, just like the compilers for those other languages do implicitly. The fact that typical current C compilers do not do it is not a language design issue -- it's an implementation issue; apparently there's no demand by users for such a feature, so no compiler vendor bothers implementing it.
As a language design choice, having separate .h files (in a human-readable and editable text format) gives you the best of both worlds: you can start separately compiling client code based on a module implementation that doesn't yet exist, if you wish, by writing the .h file by hand; or you (assuming by absurd a compiler implementation that supplies it;-) can get the .h file automatically from the implementation as a side effect of compiling it.
If C, C++, &c, keep thriving (apparently they're still doing fine today;-), and demand like yours for not manually writing headers grows, eventually compiler writers will have to supply the "header generation" option, and the "best of both worlds" won't stay theoretical!-)
It helps to think a bit about the capabilities of the computers that were available when, say c, was written. Main memory was measured in kilowords, and not necessarily very many of them. Disks were bigger, but not much. Serrious storage meant reel-to-reel tapes, mounted by hand, by grumpy operators, who really wanted you to go away so they could play hunt the wumpus. A 1 MIPS machine was screaming fast. And with all these limitation you had to share it. Possibly with a score of other users.
Anything that reduced the space or time complexity of compilation was a big win. And headers do both.
Don't forget the documentation a header provides. There is usually anything in it you need to know for using the module. I for my part don't want to scan through a looong sourcecode to learn what there is that I need to use and how to call it... You would extract this information anyway, which effectively results in -- a header file. No longer an issue with modern IDEs, of course, but working with some old C code I really love to have hand-crafted header files that include comments about the usage and about pre- and postconditions.
Keeping source, header and additional documentation in sync still is another can of worms...
The whole idea of inspecting the binary output files of language processors would have been hard to comprehend when C invented .h files. There was a system called JOVIAL that did something like it, but it was exotic and confined more-or-less exclusively to military projects. (I've never seen a JOVIAL program, I've only heard about it.)
So when C came out the usual design pattern for modularity was "no checks whatsoever". There might be a restriction that .text symbols could only link to .text and .data to .data, but that was it. That is, the compilers of the day typically processed one source file at a time and then linkers put them together without the slightest level of error checking other than, if you were lucky, "I'm a function symbol" vs "I'm a data symbol".
So the idea of actually having the compiler understand the thing you were calling was somewhat new.
Even today, if you make a totally bogus header, no one catches you in most AOT compilers. Clever things like CLR languages and Java actually do encode things in the class files.
So yes, in the long run, we probably won't have header files.
No you dont have headers in Java -- but you do have interfaces and I every serious Java guru recommends you define anything used by other projects/systems as an interface and an implementation.
Lets see a java interface definition contains call signatures, type definitions and contants.
MOST C header files contain call signatures, type definitions and constants.
So for all pratical purposes C/C++ header files are just interface definitions and should thus be considered a Good Thing. Now I know its possible to define a myriad other things in header files as well (MARCROs, constants etc. etc. ) but that just part of the whole wonderful world of C:-
int function target () {
// Default for shoot
return FOOT;
}
For Detail Read this
A header file commonly contains forward declarations of classes, subroutines, variables, and other identifiers. Programmers who wish to declare standardized identifiers in more than one source file can place such identifiers in a single header file, which other code can then include whenever the header contents are required.
The C standard library and C++ standard library traditionally declare their standard functions in header files.
And what if you want to give somebody else the declarations to use your library without giving them the implementation?
As another answer points out - the original reason for headers was to make the parse/compile easier on platforms with very simple and limited tools. It was a great step forward to have a machine with 2 floppies so you could have the compiler on one and your code on the other - made things a lot easier.
When you divide code in header and source files you divide declaration and definition. When you look in header files you can see what you have and if you wand to see implementation details you go to source file.