How are C++20 modules compiled? - c++

Some sources say that compilers parse modules and create an abstract syntax tree (AST), which is then used when parsing all code files that import the module. This would reduce the amount of parsing the compiler has to do as opposed to when #including headers, but everything would still have to be compiled once for every code file that imports a module.
Other sources say that modules are only compiled once.
How and when are modules compiled, and how does this affect inlining at compile time?

The products of module compilation are implementation dependent. But broadly speaking, they are whatever the compiler needs them to be to make module inclusion efficient. That is, after all, the whole point of modules. When building a module interface, the compiler has 100% of the information it needs to have to make including that module interface efficient.
Module compilation has only one special interaction with "inlining": member functions of a class defined within the class definition are not implicitly given an inline declaration. That's the only effect that modules have on "inlining".
And of course, the inline keyword is not strictly about "inlining". If you put definitions of things in a module's interface files, those definitions can be available for inlining by those who import those interfaces, whether the inline keyword is used (explicitly or implicitly) or not. This was true pre-modules, and it is still true in module builds.

Related

Motivating real world examples of the 'inline' specifier?

Background: The C++ inline keyword does not determine if a function should be inlined.
Instead, inline permits you to provide multiple definitions of a single function or variable, so long as each definition occurs in a different translation unit.
Basically, this allows definitions of global variables and functions in header files.
Are there some examples of why I might want to write a definition in a header file?
I've heard that there might be templating examples where it's impossible to write the definition in a separate cpp file.
I've heard other claims about performance. But is that really true? Since, to my knowledge, the use of the inline keyword doesn't guarantee that the function call is inlined (and vice versa).
I have a sense that this feature is probably primarily used by library writers trying to write wacky and highly optimized implementations. But are there some examples?
It's actually simple: you need inline when you want to write a definition (of a function or variable (since c++17)) in a header. Otherwise you would violate odr as soon as your header is included in more than 1 tu. That's it. That's all there is to it.
Of note is that some entities are implicitly declared inline like:
methods defined inside the body of the class
template functions and variables
constexpr functions and variables
Now the question becomes why and when would someone want to write definitions in the header instead of separating declarations in headers and definitions in source code files. There are advantages and disadvantages to this approach. Here are some to consider:
optimization
Having the definition in a source file means that the code of the function is baked into the tu binary. It cannot be inlined at the calling site outside of the tu that defines it. Having it in a header means that the compiler can inline it everywhere it sees fit. Or it can generate different code for the function depending on the context where it is called. The same can be achieved with lto within an executable or library, but for libraries the only option for enabling this optimization is having the definitions in the header.
library distribution
Besides enabling more optimizations in a library, having a header only library (when it's possible) means an easier way to distribute that library. All the user has to do is download the headers folder and add it to the include path of his/her project. In the case of non header only library things become more complicated. Because you can't mix and match binaries compiled by different compiler and even by the same compiler but with different flags. So you either have to distribute your library with the full source code along with a build tool or have the library compiled in many formats (cpu architecture/OS/compiler/compiler flags combinations)
human preference
Having to write the code once is considered by some (me included) an advantage: both from code documentation perspective and from a maintenance perspective. Others consider separating declaration from definitions is better. One argument is that it achieves separation of interface vs implementation but that is just not the case: in a header you need to have private member declarations even if those aren't part of the interface.
compile time performance
Having all the code in header means duplicating it in every tu. This is a real problem when it comes to compilation time. Heavy header C++ projects are notorious for slow compilation times. It also means that a modification of a function definition would trigger the recompilation of all the tu that include it, as opposed to just 1 tu in the case of definition in source code. Precompiled headers try to solve this problem but the solutions are not portable and have problems of their own.
If the same function definition appears in multiple compilation units then it needs to be inline otherwise you get a linking error.
You need the inline keyword e.g. for function templates if you want to make them available using a header because then their definition also has to be in the header.
The below statement might be a bit oversimplified because compilers and linkers are really complex nowadays, but to get a basic idea it is still valid.
A cpp file and the headers included by that cpp file form a compilation unit and each compilation unit is compiled individually. Within that compilation unit, the compiler can do many optimizations like potentially inlining any function call (no matter if it is a member or a free function) as long as the code still behaves according to the specification.
So if you place the function definition in the header you allow the compiler to know the code of that function and potentially do more optimizations.
If the definition is in another compilation unit the compiler can't do much and optimizations then can only be done at linking time. Link time optimizations are also possible and are indeed also done. And while link-time optimizations became better they potentially can't do as much as the compiler can do.
Header only libraries have the big advantage that you do not need to provide project files with them, the one how wants to use that library just copies the headers to their projects and includes them.
In short:
You're writing a library and you want it to be header-only, to make its use more convenient.
Even if it's not a library, in some cases you may want to keep some of the definitions in a header to make it easier to maintain (whether or not this makes things easier is subjective).
to my knowledge, the use of the inline keyword doesn't guarantee that the function call is inlined
Yes, defining it in a header (as inline) doesn't guarantee inlining. But if you don't define it in a header, it will never be inlined (unless you're using link-time optimizations). So:
You want the compiler to be able to inline the functions, if it decides to.
Also it may the compiler more knowledge about a function:
maybe it never throws, but is not marked noexcept;
maybe several consecutive calls can be merged into one (there's no side effects, etc), but __attribute__((const)) is missing;
maybe it never returns, but [[noreturn]] is missing;
...
there might be templating examples where it's impossible to write the definition in a separate cpp file.
That's true for most templates. They automatically behave as if they were inline, so you don't need to specify it explicitly. See Why can templates only be implemented in the header file? for details.

Using old libraries with the new module system [duplicate]

I've been following up C++ standardization and came across C++ modules idea. I could not find a good article on it. What exactly is it about?
Motivation
The simplistic answer is that a C++ module is like a header that is also a translation unit. It is like a header in that you can use it (with import, which is a new contextual keyword) to gain access to declarations from a library. Because it is a translation unit (or several for a complicated module), it is compiled separately and only once. (Recall that #include literally copies the contents of a file into the translation unit that contains the directive.) This combination yields a number of advantages:
Isolation: because a module unit is a separate translation unit, it has its own set of macros and using declarations/directives that neither affect nor are affected by those in the importing translation unit or any other module. This prevents collisions between an identifier #defined in one header and used in another. While use of using still should be judicious, it is not intrinsically harmful to write even using namespace at namespace scope in a module interface.
Interface control: because a module unit can declare entities with internal linkage (with static or namespace {}), with export (the keyword reserved for purposes like these since C++98), or with neither, it can restrict how much of its contents are available to clients. This replaces the namespace detail idiom which can conflict between headers (that use it in the same containing namespace).
Deduplication: because in many cases it is no longer necessary to provide a declaration in a header file and a definition in a separate source file, redundancy and the associated opportunity for divergence are reduced.
One Definition Rule violation avoidance: the ODR exists solely because of the need to define certain entities (types, inline functions/variables, and templates) in every translation unit that uses them. A module can define an entity just once and nonetheless provide that definition to clients. Also, existing headers that already violate the ODR via internal-linkage declarations stop being ill-formed, no diagnostic required, when they are converted into modules.
Non-local variable initialization order: because import establishes a dependency order among translation units that contain (unique) variable definitions, there is an obvious order in which to initialize non-local variables with static storage duration. C++17 supplied inline variables with a controllable initialization order; modules extend that to normal variables (and do not need inline variables at all).
Module-private declarations: entities declared in a module that neither are exported nor have internal linkage are usable (by name) by any translation unit in the module, providing a useful middle ground between the preexisting choices of static or not. While it remains to be seen what exactly implementations will do with these, they correspond closely to the notion of “hidden” (or “not exported”) symbols in a dynamic object, providing a potential language recognition of this practical dynamic linking optimization.
ABI stability: the rules for inline (whose ODR-compatibility purpose is not relevant in a module) have been adjusted to support (but not require!) an implementation strategy where non-inline functions can serve as an ABI boundary for shared library upgrades.
Compilation speed: because the contents of a module do not need to be reparsed as part of every translation unit that uses them, in many cases compilation proceeds much faster. It's worth noting that the critical path of compilation (which governs the latency of infinitely parallel builds) can actually be longer, because modules must be processed separately in dependency order, but the total CPU time is significantly reduced, and rebuilds of only some modules/clients are much faster.
Tooling: the “structural declarations” involving import and module have restrictions on their use to make them readily and efficiently detectable by tools that need to understand the dependency graph of a project. The restrictions also allow most if not all existing uses of those common words as identifiers.
Approach
Because a name declared in a module must be found in a client, a significant new kind of name lookup is required that works across translation units; getting correct rules for argument-dependent lookup and template instantiation was a significant part of what made this proposal take over a decade to standardize. The simple rule is that (aside from being incompatible with internal linkage for obvious reasons) export affects only name lookup; any entity available via (e.g.) decltype or a template parameter has exactly the same behavior regardless of whether it is exported.
Because a module must be able to provide types, inline functions, and templates to its clients in a way that allows their contents to be used, typically a compiler generates an artifact when processing a module (sometimes called a Compiled Module Interface) that contains the detailed information needed by the clients. The CMI is similar to a pre-compiled header, but does not have the restrictions that the same headers must be included, in the same order, in every relevant translation unit. It is also similar to the behavior of Fortran modules, although there is no analog to their feature of importing only particular names from a module.
Because the compiler must be able to find the CMI based on import foo; (and find source files based on import :partition;), it must know some mapping from “foo” to the (CMI) file name. Clang has established the term “module map” for this concept; in general, it remains to be seen just how to handle situations like implicit directory structures or module (or partition) names that don’t match source file names.
Non-features
Like other “binary header” technologies, modules should not be taken to be a distribution mechanism (as much as those of a secretive bent might want to avoid providing headers and all the definitions of any contained templates). Nor are they “header-only” in the traditional sense, although a compiler could regenerate the CMI for each project using a module.
While in many other languages (e.g., Python), modules are units not only of compilation but also of naming, C++ modules are not namespaces. C++ already has namespaces, and modules change nothing about their usage and behavior (partly for backward compatibility). It is to be expected, however, that module names will often align with namespace names, especially for libraries with well-known namespace names that would be confusing as the name of any other module. (A nested::name may be rendered as a module name nested.name, since . and not :: is allowed there; a . has no significance in C++20 except as a convention.)
Modules also do not obsolete the pImpl idiom or prevent the fragile base class problem. If a class is complete for a client, then changing that class still requires recompiling the client in general.
Finally, modules do not provide a mechanism to provide the macros that are an important part of the interface of some libraries; it is possible to provide a wrapper header that looks like
// wants_macros.hpp
import wants.macros;
#define INTERFACE_MACRO(x) (wants::f(x),wants::g(x))
(You don't even need #include guards unless there might be other definitions of the same macro.)
Multi-file modules
A module has a single primary interface unit that contains export module A;: this is the translation unit processed by the compiler to produce the data needed by clients. It may recruit additional interface partitions that contain export module A:sub1;; these are separate translation units but are included in the one CMI for the module. It is also possible to have implementation partitions (module A:impl1;) that can be imported by the interface without providing their contents to clients of the overall module. (Some implementations may leak those contents to clients anyway for technical reasons, but this never affects name lookup.)
Finally, (non-partition) module implementation units (with simply module A;) provide nothing at all to clients, but can define entities declared in the module interface (which they implicitly import). All translation units of a module can use anything declared in another part of the same module that they import so long as it does not have internal linkage (in other words, they ignore export).
As a special case, a single-file module can contain a module :private; declaration that effectively packages an implementation unit with the interface; this is called a private module fragment. In particular, it can be used to define a class while leaving it incomplete in a client (which provides binary compatibility but will not prevent recompilation with typical build tools).
Upgrading
Converting a header-based library to a module is neither a trivial nor a monumental task. The required boilerplate is very minor (two lines in many cases), and it is possible to put export {} around relatively large sections of a file (although there are unfortunate limitations: no static_assert declarations or deduction guides may be enclosed). Generally, a namespace detail {} can either be converted to namespace {} or simply left unexported; in the latter case, its contents may often be moved to the containing namespace. Class members need to be explicitly marked inline if it is desired that even ABI-conservative implementations inline calls to them from other translation units.
Of course, not all libraries can be upgraded instantaneously; backward comptibility has always been one of C++’s emphases, and there are two separate mechanisms to allow module-based libraries to depend on header-based libraries (based on those supplied by initial experimental implementations). (In the other direction, a header can simply use import like anything else even if it is used by a module in either fashion.)
As in the Modules Technical Specification, a global module fragment may appear at the beginning of a module unit (introduced by a bare module;) that contains only preprocessor directives: in particular, #includes for the headers on which a module depends. It is possible in most cases to instantiate a template defined in a module that uses declarations from a header it includes because those declarations are incorporated into the CMI.
There is also the option to import a “modular” (or importable) header (import "foo.hpp";): what is imported is a synthesized header unit that acts like a module except that it exports everything it declares—even things with internal linkage (which may (still!) produce ODR violations if used outside the header) and macros. (It is an error to use a macro given different values by different imported header units; command-line macros (-D) aren't considered for that.) Informally, a header is modular if including it once, with no special macros defined, is sufficient to use it (rather than it being, say, a C implementation of templates with token pasting). If the implementation knows that a header is importable, it can replace an #include of it with an import automatically.
In C++20, the standard library is still presented as headers; all the C++ headers (but not the C headers or <cmeow> wrappers) are specified to be importable. C++23 will presumably additionally provide named modules (though perhaps not one per header).
Example
A very simple module might be
export module simple;
import <string_view>;
import <memory>;
using std::unique_ptr; // not exported
int *parse(std::string_view s) {/*…*/} // cannot collide with other modules
export namespace simple {
auto get_ints(const char *text)
{return unique_ptr<int[]>(parse(text));}
}
which could be used as
import simple;
int main() {
return simple::get_ints("1 1 2 3 5 8")[0]-1;
}
Conclusion
Modules are expected to improve C++ programming in a number of ways, but the improvements are incremental and (in practice) gradual. The committee has strongly rejected the idea of making modules a “new language” (e.g., that changes the rules for comparisons between signed and unsigned integers) because it would make it more difficult to convert existing code and would make it hazardous to move code between modular and non-modular files.
MSVC has had an implementation of modules (closely following the TS) for some time. Clang has had an implementation of importable headers for several years as well. GCC has a functional but incomplete implementation of the standardized version.
C++ modules are proposal that will allow compilers to use "semantic imports" instead of the old text inclusion model. Instead of performing a copy and paste when a #include preprocessor directive is found, they will read a binary file that contains a serialization of the abstract syntax tree that represents the code.
These semantic imports avoid the multiple recompilation of the code that is contained in headers, speeding up compilation. E.g. if you project contains 100 #includes of <iostream>, in different .cpp files, the header will only be parsed once per language configuration, rather than once per translation unit that uses the module.
Microsoft's proposal goes beyond that and introduces the internal keyword. A member of a class with internal visibility will not be seen outside of a module, thus allowing class implementers to hide implementation details from a class.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4465.pdf
I wrote a small example using <iostream> in my blog, using LLVM's module cache:
https://cppisland.wordpress.com/2015/09/13/6/
Please take a look at this simple example I love. The modules there are really good explained. The author uses simple terms and great examples to examine every aspect of the problem, stated in the article.
https://www.modernescpp.com/index.php/c-20-modules
Here is one of the first propositions :
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1778.pdf
And a very good explanation :
http://clang.llvm.org/docs/Modules.html

Does the module standard for C++ solve the problem of hiding private data from callers?

In C++, modules are being standardized to solve the problem of #include bloat among other things. Compilers in C++ have to parse too much.
But also, because C++ stores data inline which is efficient, even the caller has to know about the memory layout of objects.
Does the forthcoming module standard address this issue?
Example:
class GLWin {
private:
GLFWwindow* win;
glm::mat4 projection;
...
};
An object containing a pointer to an internal implementation can be decoupled by an empty declaration, ie:
class GLFWwindow;
but if, for performance we include the mat4 object inside the window, then we need to know the size, which currently means including a definition, bringing in a header file which is often huge because of cascading includes. Is there any mechanism in modules that hides the detail and allows reserving the correct amount of space for the object while leaving it opaque like a pointer?
Modules does not make it possible to implement the system such that code external to the module has no idea what the private members of a type are. That wouldn't work with static reflection proposals, which allow querying and iteration over the private members of a type.
What modules does do is make it so that:
When you get these kinds of recursive "inclusions", they don't actually expose those internals to the external code. In your example, let's say that glm::mat4 comes from a module called GLM. Your module that declares GLFWin will have import GLM, since it needs those definitions to work. However, that is an implementation detail, so you won't be doing export import GLM.
Now, someone else comes along and imports your module. To perform that import, the compiler will have to read the GLM module. But because your module does not export GLM, the code which imports your module cannot use it. That is, they themselves don't get to use glm::mat4 or anything else, unless they themselves import that module.
This doesn't seem like much of a difference, since the GLM module is still required, but it is a significant one. Users don't get an interface from a module just because that module is being used by a module they're using.
These imports aren't nearly as painful. The result of compiling a module is supposed to be a file (typically called a BMI, "binary module interface) which is something a compiler can quickly read and convert into its internal data structures. Furthermore, if you compile multiple translation units in the same compiler process, then they can share loaded modules. After all, GLM doesn't change depending on where you import it from, so there's no reason to even reload the module; you just use what's already in memory.
Lastly, there's recompilation. If you were using headers, and you changed the GLM headers, then every file that includes them would need to be recompiled. This is still true of modules, but in a far less painful way.
Let's assume that your GLFWin-creating module and the module that is consuming it both use std::vector at some point. Now, let's say you change GLM, so you have to recompile both modules. In a header world, this also means that both files have to recompile the <vector> header, even though it has not changed and doesn't depend on GLM at all. That's how text inclusion works.
In a modular world, they don't have to recompile the vector module. It doesn't depend on the GLM module in any way, so it can just use the already existing vector module. And this is true for any included modules that are not dependent on GLM. So while you still need a cascade of recompiles, the recompiles themselves should be significantly faster, due to not having to recompile everything each translation unit itself uses. A 5000 line file recompiles like a 5000 line file, not 5000 + however many lines it includes.
The module concept changes how we think about dependencies. There will be no headers anymore but binary module interfaces (BMI), which are generated by the compiler and contains all information about object size, object structures and dependencies. The module of your class has to depend on the module of GLFWindow and glm::mat, since you can't compile it otherwise. So in one sense you still have to expose your internal data to other classes but your compiler doesn't have to crawl over all includes but only on the imports of the BMI, which are needed to understand the class/function interfaces and if it finds multiple times the same BMI as dependency it will only parse it once.
This also means, that you will no longer separate definitions and declaration in separate files since it doesn't make any sense. You will end up with something which looks more like a Java .class file.

Fortran like Modules in C++ [duplicate]

I've been following up C++ standardization and came across C++ modules idea. I could not find a good article on it. What exactly is it about?
Motivation
The simplistic answer is that a C++ module is like a header that is also a translation unit. It is like a header in that you can use it (with import, which is a new contextual keyword) to gain access to declarations from a library. Because it is a translation unit (or several for a complicated module), it is compiled separately and only once. (Recall that #include literally copies the contents of a file into the translation unit that contains the directive.) This combination yields a number of advantages:
Isolation: because a module unit is a separate translation unit, it has its own set of macros and using declarations/directives that neither affect nor are affected by those in the importing translation unit or any other module. This prevents collisions between an identifier #defined in one header and used in another. While use of using still should be judicious, it is not intrinsically harmful to write even using namespace at namespace scope in a module interface.
Interface control: because a module unit can declare entities with internal linkage (with static or namespace {}), with export (the keyword reserved for purposes like these since C++98), or with neither, it can restrict how much of its contents are available to clients. This replaces the namespace detail idiom which can conflict between headers (that use it in the same containing namespace).
Deduplication: because in many cases it is no longer necessary to provide a declaration in a header file and a definition in a separate source file, redundancy and the associated opportunity for divergence are reduced.
One Definition Rule violation avoidance: the ODR exists solely because of the need to define certain entities (types, inline functions/variables, and templates) in every translation unit that uses them. A module can define an entity just once and nonetheless provide that definition to clients. Also, existing headers that already violate the ODR via internal-linkage declarations stop being ill-formed, no diagnostic required, when they are converted into modules.
Non-local variable initialization order: because import establishes a dependency order among translation units that contain (unique) variable definitions, there is an obvious order in which to initialize non-local variables with static storage duration. C++17 supplied inline variables with a controllable initialization order; modules extend that to normal variables (and do not need inline variables at all).
Module-private declarations: entities declared in a module that neither are exported nor have internal linkage are usable (by name) by any translation unit in the module, providing a useful middle ground between the preexisting choices of static or not. While it remains to be seen what exactly implementations will do with these, they correspond closely to the notion of “hidden” (or “not exported”) symbols in a dynamic object, providing a potential language recognition of this practical dynamic linking optimization.
ABI stability: the rules for inline (whose ODR-compatibility purpose is not relevant in a module) have been adjusted to support (but not require!) an implementation strategy where non-inline functions can serve as an ABI boundary for shared library upgrades.
Compilation speed: because the contents of a module do not need to be reparsed as part of every translation unit that uses them, in many cases compilation proceeds much faster. It's worth noting that the critical path of compilation (which governs the latency of infinitely parallel builds) can actually be longer, because modules must be processed separately in dependency order, but the total CPU time is significantly reduced, and rebuilds of only some modules/clients are much faster.
Tooling: the “structural declarations” involving import and module have restrictions on their use to make them readily and efficiently detectable by tools that need to understand the dependency graph of a project. The restrictions also allow most if not all existing uses of those common words as identifiers.
Approach
Because a name declared in a module must be found in a client, a significant new kind of name lookup is required that works across translation units; getting correct rules for argument-dependent lookup and template instantiation was a significant part of what made this proposal take over a decade to standardize. The simple rule is that (aside from being incompatible with internal linkage for obvious reasons) export affects only name lookup; any entity available via (e.g.) decltype or a template parameter has exactly the same behavior regardless of whether it is exported.
Because a module must be able to provide types, inline functions, and templates to its clients in a way that allows their contents to be used, typically a compiler generates an artifact when processing a module (sometimes called a Compiled Module Interface) that contains the detailed information needed by the clients. The CMI is similar to a pre-compiled header, but does not have the restrictions that the same headers must be included, in the same order, in every relevant translation unit. It is also similar to the behavior of Fortran modules, although there is no analog to their feature of importing only particular names from a module.
Because the compiler must be able to find the CMI based on import foo; (and find source files based on import :partition;), it must know some mapping from “foo” to the (CMI) file name. Clang has established the term “module map” for this concept; in general, it remains to be seen just how to handle situations like implicit directory structures or module (or partition) names that don’t match source file names.
Non-features
Like other “binary header” technologies, modules should not be taken to be a distribution mechanism (as much as those of a secretive bent might want to avoid providing headers and all the definitions of any contained templates). Nor are they “header-only” in the traditional sense, although a compiler could regenerate the CMI for each project using a module.
While in many other languages (e.g., Python), modules are units not only of compilation but also of naming, C++ modules are not namespaces. C++ already has namespaces, and modules change nothing about their usage and behavior (partly for backward compatibility). It is to be expected, however, that module names will often align with namespace names, especially for libraries with well-known namespace names that would be confusing as the name of any other module. (A nested::name may be rendered as a module name nested.name, since . and not :: is allowed there; a . has no significance in C++20 except as a convention.)
Modules also do not obsolete the pImpl idiom or prevent the fragile base class problem. If a class is complete for a client, then changing that class still requires recompiling the client in general.
Finally, modules do not provide a mechanism to provide the macros that are an important part of the interface of some libraries; it is possible to provide a wrapper header that looks like
// wants_macros.hpp
import wants.macros;
#define INTERFACE_MACRO(x) (wants::f(x),wants::g(x))
(You don't even need #include guards unless there might be other definitions of the same macro.)
Multi-file modules
A module has a single primary interface unit that contains export module A;: this is the translation unit processed by the compiler to produce the data needed by clients. It may recruit additional interface partitions that contain export module A:sub1;; these are separate translation units but are included in the one CMI for the module. It is also possible to have implementation partitions (module A:impl1;) that can be imported by the interface without providing their contents to clients of the overall module. (Some implementations may leak those contents to clients anyway for technical reasons, but this never affects name lookup.)
Finally, (non-partition) module implementation units (with simply module A;) provide nothing at all to clients, but can define entities declared in the module interface (which they implicitly import). All translation units of a module can use anything declared in another part of the same module that they import so long as it does not have internal linkage (in other words, they ignore export).
As a special case, a single-file module can contain a module :private; declaration that effectively packages an implementation unit with the interface; this is called a private module fragment. In particular, it can be used to define a class while leaving it incomplete in a client (which provides binary compatibility but will not prevent recompilation with typical build tools).
Upgrading
Converting a header-based library to a module is neither a trivial nor a monumental task. The required boilerplate is very minor (two lines in many cases), and it is possible to put export {} around relatively large sections of a file (although there are unfortunate limitations: no static_assert declarations or deduction guides may be enclosed). Generally, a namespace detail {} can either be converted to namespace {} or simply left unexported; in the latter case, its contents may often be moved to the containing namespace. Class members need to be explicitly marked inline if it is desired that even ABI-conservative implementations inline calls to them from other translation units.
Of course, not all libraries can be upgraded instantaneously; backward comptibility has always been one of C++’s emphases, and there are two separate mechanisms to allow module-based libraries to depend on header-based libraries (based on those supplied by initial experimental implementations). (In the other direction, a header can simply use import like anything else even if it is used by a module in either fashion.)
As in the Modules Technical Specification, a global module fragment may appear at the beginning of a module unit (introduced by a bare module;) that contains only preprocessor directives: in particular, #includes for the headers on which a module depends. It is possible in most cases to instantiate a template defined in a module that uses declarations from a header it includes because those declarations are incorporated into the CMI.
There is also the option to import a “modular” (or importable) header (import "foo.hpp";): what is imported is a synthesized header unit that acts like a module except that it exports everything it declares—even things with internal linkage (which may (still!) produce ODR violations if used outside the header) and macros. (It is an error to use a macro given different values by different imported header units; command-line macros (-D) aren't considered for that.) Informally, a header is modular if including it once, with no special macros defined, is sufficient to use it (rather than it being, say, a C implementation of templates with token pasting). If the implementation knows that a header is importable, it can replace an #include of it with an import automatically.
In C++20, the standard library is still presented as headers; all the C++ headers (but not the C headers or <cmeow> wrappers) are specified to be importable. C++23 will presumably additionally provide named modules (though perhaps not one per header).
Example
A very simple module might be
export module simple;
import <string_view>;
import <memory>;
using std::unique_ptr; // not exported
int *parse(std::string_view s) {/*…*/} // cannot collide with other modules
export namespace simple {
auto get_ints(const char *text)
{return unique_ptr<int[]>(parse(text));}
}
which could be used as
import simple;
int main() {
return simple::get_ints("1 1 2 3 5 8")[0]-1;
}
Conclusion
Modules are expected to improve C++ programming in a number of ways, but the improvements are incremental and (in practice) gradual. The committee has strongly rejected the idea of making modules a “new language” (e.g., that changes the rules for comparisons between signed and unsigned integers) because it would make it more difficult to convert existing code and would make it hazardous to move code between modular and non-modular files.
MSVC has had an implementation of modules (closely following the TS) for some time. Clang has had an implementation of importable headers for several years as well. GCC has a functional but incomplete implementation of the standardized version.
C++ modules are proposal that will allow compilers to use "semantic imports" instead of the old text inclusion model. Instead of performing a copy and paste when a #include preprocessor directive is found, they will read a binary file that contains a serialization of the abstract syntax tree that represents the code.
These semantic imports avoid the multiple recompilation of the code that is contained in headers, speeding up compilation. E.g. if you project contains 100 #includes of <iostream>, in different .cpp files, the header will only be parsed once per language configuration, rather than once per translation unit that uses the module.
Microsoft's proposal goes beyond that and introduces the internal keyword. A member of a class with internal visibility will not be seen outside of a module, thus allowing class implementers to hide implementation details from a class.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4465.pdf
I wrote a small example using <iostream> in my blog, using LLVM's module cache:
https://cppisland.wordpress.com/2015/09/13/6/
Please take a look at this simple example I love. The modules there are really good explained. The author uses simple terms and great examples to examine every aspect of the problem, stated in the article.
https://www.modernescpp.com/index.php/c-20-modules
Here is one of the first propositions :
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1778.pdf
And a very good explanation :
http://clang.llvm.org/docs/Modules.html

What exactly are C++ modules?

I've been following up C++ standardization and came across C++ modules idea. I could not find a good article on it. What exactly is it about?
Motivation
The simplistic answer is that a C++ module is like a header that is also a translation unit. It is like a header in that you can use it (with import, which is a new contextual keyword) to gain access to declarations from a library. Because it is a translation unit (or several for a complicated module), it is compiled separately and only once. (Recall that #include literally copies the contents of a file into the translation unit that contains the directive.) This combination yields a number of advantages:
Isolation: because a module unit is a separate translation unit, it has its own set of macros and using declarations/directives that neither affect nor are affected by those in the importing translation unit or any other module. This prevents collisions between an identifier #defined in one header and used in another. While use of using still should be judicious, it is not intrinsically harmful to write even using namespace at namespace scope in a module interface.
Interface control: because a module unit can declare entities with internal linkage (with static or namespace {}), with export (the keyword reserved for purposes like these since C++98), or with neither, it can restrict how much of its contents are available to clients. This replaces the namespace detail idiom which can conflict between headers (that use it in the same containing namespace).
Deduplication: because in many cases it is no longer necessary to provide a declaration in a header file and a definition in a separate source file, redundancy and the associated opportunity for divergence are reduced.
One Definition Rule violation avoidance: the ODR exists solely because of the need to define certain entities (types, inline functions/variables, and templates) in every translation unit that uses them. A module can define an entity just once and nonetheless provide that definition to clients. Also, existing headers that already violate the ODR via internal-linkage declarations stop being ill-formed, no diagnostic required, when they are converted into modules.
Non-local variable initialization order: because import establishes a dependency order among translation units that contain (unique) variable definitions, there is an obvious order in which to initialize non-local variables with static storage duration. C++17 supplied inline variables with a controllable initialization order; modules extend that to normal variables (and do not need inline variables at all).
Module-private declarations: entities declared in a module that neither are exported nor have internal linkage are usable (by name) by any translation unit in the module, providing a useful middle ground between the preexisting choices of static or not. While it remains to be seen what exactly implementations will do with these, they correspond closely to the notion of “hidden” (or “not exported”) symbols in a dynamic object, providing a potential language recognition of this practical dynamic linking optimization.
ABI stability: the rules for inline (whose ODR-compatibility purpose is not relevant in a module) have been adjusted to support (but not require!) an implementation strategy where non-inline functions can serve as an ABI boundary for shared library upgrades.
Compilation speed: because the contents of a module do not need to be reparsed as part of every translation unit that uses them, in many cases compilation proceeds much faster. It's worth noting that the critical path of compilation (which governs the latency of infinitely parallel builds) can actually be longer, because modules must be processed separately in dependency order, but the total CPU time is significantly reduced, and rebuilds of only some modules/clients are much faster.
Tooling: the “structural declarations” involving import and module have restrictions on their use to make them readily and efficiently detectable by tools that need to understand the dependency graph of a project. The restrictions also allow most if not all existing uses of those common words as identifiers.
Approach
Because a name declared in a module must be found in a client, a significant new kind of name lookup is required that works across translation units; getting correct rules for argument-dependent lookup and template instantiation was a significant part of what made this proposal take over a decade to standardize. The simple rule is that (aside from being incompatible with internal linkage for obvious reasons) export affects only name lookup; any entity available via (e.g.) decltype or a template parameter has exactly the same behavior regardless of whether it is exported.
Because a module must be able to provide types, inline functions, and templates to its clients in a way that allows their contents to be used, typically a compiler generates an artifact when processing a module (sometimes called a Compiled Module Interface) that contains the detailed information needed by the clients. The CMI is similar to a pre-compiled header, but does not have the restrictions that the same headers must be included, in the same order, in every relevant translation unit. It is also similar to the behavior of Fortran modules, although there is no analog to their feature of importing only particular names from a module.
Because the compiler must be able to find the CMI based on import foo; (and find source files based on import :partition;), it must know some mapping from “foo” to the (CMI) file name. Clang has established the term “module map” for this concept; in general, it remains to be seen just how to handle situations like implicit directory structures or module (or partition) names that don’t match source file names.
Non-features
Like other “binary header” technologies, modules should not be taken to be a distribution mechanism (as much as those of a secretive bent might want to avoid providing headers and all the definitions of any contained templates). Nor are they “header-only” in the traditional sense, although a compiler could regenerate the CMI for each project using a module.
While in many other languages (e.g., Python), modules are units not only of compilation but also of naming, C++ modules are not namespaces. C++ already has namespaces, and modules change nothing about their usage and behavior (partly for backward compatibility). It is to be expected, however, that module names will often align with namespace names, especially for libraries with well-known namespace names that would be confusing as the name of any other module. (A nested::name may be rendered as a module name nested.name, since . and not :: is allowed there; a . has no significance in C++20 except as a convention.)
Modules also do not obsolete the pImpl idiom or prevent the fragile base class problem. If a class is complete for a client, then changing that class still requires recompiling the client in general.
Finally, modules do not provide a mechanism to provide the macros that are an important part of the interface of some libraries; it is possible to provide a wrapper header that looks like
// wants_macros.hpp
import wants.macros;
#define INTERFACE_MACRO(x) (wants::f(x),wants::g(x))
(You don't even need #include guards unless there might be other definitions of the same macro.)
Multi-file modules
A module has a single primary interface unit that contains export module A;: this is the translation unit processed by the compiler to produce the data needed by clients. It may recruit additional interface partitions that contain export module A:sub1;; these are separate translation units but are included in the one CMI for the module. It is also possible to have implementation partitions (module A:impl1;) that can be imported by the interface without providing their contents to clients of the overall module. (Some implementations may leak those contents to clients anyway for technical reasons, but this never affects name lookup.)
Finally, (non-partition) module implementation units (with simply module A;) provide nothing at all to clients, but can define entities declared in the module interface (which they implicitly import). All translation units of a module can use anything declared in another part of the same module that they import so long as it does not have internal linkage (in other words, they ignore export).
As a special case, a single-file module can contain a module :private; declaration that effectively packages an implementation unit with the interface; this is called a private module fragment. In particular, it can be used to define a class while leaving it incomplete in a client (which provides binary compatibility but will not prevent recompilation with typical build tools).
Upgrading
Converting a header-based library to a module is neither a trivial nor a monumental task. The required boilerplate is very minor (two lines in many cases), and it is possible to put export {} around relatively large sections of a file (although there are unfortunate limitations: no static_assert declarations or deduction guides may be enclosed). Generally, a namespace detail {} can either be converted to namespace {} or simply left unexported; in the latter case, its contents may often be moved to the containing namespace. Class members need to be explicitly marked inline if it is desired that even ABI-conservative implementations inline calls to them from other translation units.
Of course, not all libraries can be upgraded instantaneously; backward comptibility has always been one of C++’s emphases, and there are two separate mechanisms to allow module-based libraries to depend on header-based libraries (based on those supplied by initial experimental implementations). (In the other direction, a header can simply use import like anything else even if it is used by a module in either fashion.)
As in the Modules Technical Specification, a global module fragment may appear at the beginning of a module unit (introduced by a bare module;) that contains only preprocessor directives: in particular, #includes for the headers on which a module depends. It is possible in most cases to instantiate a template defined in a module that uses declarations from a header it includes because those declarations are incorporated into the CMI.
There is also the option to import a “modular” (or importable) header (import "foo.hpp";): what is imported is a synthesized header unit that acts like a module except that it exports everything it declares—even things with internal linkage (which may (still!) produce ODR violations if used outside the header) and macros. (It is an error to use a macro given different values by different imported header units; command-line macros (-D) aren't considered for that.) Informally, a header is modular if including it once, with no special macros defined, is sufficient to use it (rather than it being, say, a C implementation of templates with token pasting). If the implementation knows that a header is importable, it can replace an #include of it with an import automatically.
In C++20, the standard library is still presented as headers; all the C++ headers (but not the C headers or <cmeow> wrappers) are specified to be importable. C++23 will presumably additionally provide named modules (though perhaps not one per header).
Example
A very simple module might be
export module simple;
import <string_view>;
import <memory>;
using std::unique_ptr; // not exported
int *parse(std::string_view s) {/*…*/} // cannot collide with other modules
export namespace simple {
auto get_ints(const char *text)
{return unique_ptr<int[]>(parse(text));}
}
which could be used as
import simple;
int main() {
return simple::get_ints("1 1 2 3 5 8")[0]-1;
}
Conclusion
Modules are expected to improve C++ programming in a number of ways, but the improvements are incremental and (in practice) gradual. The committee has strongly rejected the idea of making modules a “new language” (e.g., that changes the rules for comparisons between signed and unsigned integers) because it would make it more difficult to convert existing code and would make it hazardous to move code between modular and non-modular files.
MSVC has had an implementation of modules (closely following the TS) for some time. Clang has had an implementation of importable headers for several years as well. GCC has a functional but incomplete implementation of the standardized version.
C++ modules are proposal that will allow compilers to use "semantic imports" instead of the old text inclusion model. Instead of performing a copy and paste when a #include preprocessor directive is found, they will read a binary file that contains a serialization of the abstract syntax tree that represents the code.
These semantic imports avoid the multiple recompilation of the code that is contained in headers, speeding up compilation. E.g. if you project contains 100 #includes of <iostream>, in different .cpp files, the header will only be parsed once per language configuration, rather than once per translation unit that uses the module.
Microsoft's proposal goes beyond that and introduces the internal keyword. A member of a class with internal visibility will not be seen outside of a module, thus allowing class implementers to hide implementation details from a class.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4465.pdf
I wrote a small example using <iostream> in my blog, using LLVM's module cache:
https://cppisland.wordpress.com/2015/09/13/6/
Please take a look at this simple example I love. The modules there are really good explained. The author uses simple terms and great examples to examine every aspect of the problem, stated in the article.
https://www.modernescpp.com/index.php/c-20-modules
Here is one of the first propositions :
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1778.pdf
And a very good explanation :
http://clang.llvm.org/docs/Modules.html