What are submodules and how are they used? - fortran

I don't quite understand the purpose of submodules. I know there's very little support for them in most compilers but the concept is interesting. I think I understand the basic concept but all the examples I've seen (Fortran Wiki, Modern Fortran Explained, the technical report) are simplistic, use exactly the same example (point type all within the same file) and don't show their actual use when calling the function. In what situations would you want to use submodules? When you want to use a submodule do you include a use statement? I'd really like if someone could provide an example.

The simple answer is that you put code into submodules when you want to have less code in the parent module or submodule.
You might want to have less source code in the parent module or submodule because the source code for the parent is getting too long.
Before submodules, the only way this could be done was to move source code out of the module into a different module or external procedure. But this wasn't always possible if the source code to be moved referenced PRIVATE things or components that were to remain in the original module. Submodules can access the things declared in their parent module or submodule by host association - the source code can still access PRIVATE things in the same way that it could if it was still part of the physical source code of the parent.
You might also want to split source code out of the parent module in order to avoid compilation cascades, if relevant to your processor. Changes to a module typically require recompilation of that module (and its descendants), and then recompilation of all program units that use that module (and their descendants), repeatedly cascading to further recompilation where any recompiled program units are themselves modules. Changes to a submodule typically only require recompilation of that submodule and any of its descendant submodules.
The hierarchical nature of submodules may also suit a hierarchical code arrangement - where you do not want siblings at the same level of the hierarchy to be able to directly access the entities and procedures that they define.
The USE statement is only used where you want to access things in a scope that are provided by a module. You cannot "use" a submodule in a USE statement (though a procedure with its INTERFACE defined in a module may have its body defined in a submodule).
A submodule of a module does not USE the parent module (it cannot - that is a bit like a module trying to USE itself) and it doesn't need to - it already has access to the things in the module by host association. The submodule statement that starts a submodule program unit identifies the module (and perhaps another submodule) that it extends. There is nothing in the source of a module proper that tells it how many submodules there may be extending it.

Related

C++ 20 Modules: Submodul vs ModulPartition diffrence

I'm trying to learn the concept of modules that come with the C++20 feature. The concept of SubModule and ModulePartition confuses me a lot. They basically both do the same job, but I couldn't decide which one to use when and under what conditions.
Can you explain the difference to me exactly?
https://www.modernescpp.com/index.php/c-20-divide-modules
The difference between a "submodule" and a module partition is simple:
Submodules do not exist as a part of the module system. Module partitions are part of the module system.
A module partition is a component of a module which can be imported by other files that are themselves components of a module. This allows them to be used for private declarations that are used by multiple module implementation units, or for splitting a large module's interface into different files so that your single primary module interface unit isn't gigantic.
Partitions are effectively a way of namespace scoping a module. The partition X:Y can only ever be accessed by module units that are part of module X. The primary module interface of X may indeed export import a partition, but the outside world has no idea that those declarations are in the purview of a partition of X.
Within a module unit for the module X, you can import a partition Y of that module by naming it with :Y. That is, because the only partitions that a module unit for X has access to are the partitions of X, so there's no need to repeat yourself.
A "submodule" is not a part of a module system; it is a way of thinking about and treating a particular module. By convention, people use the naming convention X.Y to say that Y is a conceptual "submodule" of X. But this purely a convention; it has no syntactic legitimacy. That is, the language has no idea that the module X.Y has any inherent relationship to X (X could export import X.Y;, but there's no requirement to do so).
A module that is treated as a submodule is still, as far as the language is concerned, a full-fledged module with all of the powers and restrictions thereof. Its name has no special meaning, and you cannot make such a module "private" in any meaningful capacity.
Submodules (aka: just another module) does not "do the same job" as a proper partition at all. Submodules are exactly as visible to the outside world as the module that they are conceptually a "submodule" of.
The primary point of a module partition is that they are implementation details of a module. They're not something that leaks out into the module's interface. Submodules do. And maybe that's what you want.
But remember this: unlike headers, the cost of including a module is not based on how much stuff is in that module. So there isn't much point to breaking your library's interface into a bunch of tiny submodules(which is probably why the C++ committee is going to toss the entire standard library into a single std module instead of dozens of smaller modules). It's useful to have the interface defined in multiple files, but the outside world doesn't have to see them. How you choose to organize your code should be about your convenience, not how the user interacts with it.
Hence module partitions.
If what you want to do is provide multiple components of your library as different importable units, submodules are the tool for doing that. But if you're just organizing your code within a module and external code shouldn't need to know about that organization, that's what partitions are for.

What is the point of two types of module files (interface and implementation) in C++20?

When I want to export something I write export void foo(); I can implement it in the same module file or do it in a separate one. But what is the point of formally distinguishing these files (export module mymodule vs module mymodule) when, anyway, I can have any number of the latter type. Wouldn't be enough to just put export keyword before the thing I want to make public and not need to bother with special interface files?
Module implementation units can use things with module linkage (neither export nor static) in their module’s interface. If all module units were both interface and implementation units, some mechanism would be needed to deal with the circularity of this access.
The way this is implemented is that module-linkage entities are included in the compiled module interface file: it is helpful to the implementation to know immediately whether that sort of processing is needed for a translation unit. (This is similar to needing to know to what module a file belongs in order to mangle its symbol names properly.)
Moreover, requiring that all parts of the module interface be declared upfront (in the primary interface or in an interface partition that it recruits) avoids needing to “link” the interface results of multiple independent module units into one CMI to be consumed by importers. The fact that module implementation units can’t affect importers except via linking in different symbol definitions is also beneficial to the build system: importers do not need to be recompiled when module implementation units but not interface units change.
At some point, the build system sees that some file says import MyModule;. When it sees that, the build system needs to go find the module for MyModule.
If MyModule has not yet been built, the build system needs to build it. To do that, it has to (among other things) scan all of the known source files in your project to see which ones are used to build MyModule. But the most important thing is that it needs to figure out which file it specifically needs to build in order for import MyModule to work right now.
That process works best and fastest if the system only needs to look for a single file to build (this way, the system can pre-process everything with a quick scanner to find all of those files). So the module system provides that: for any particular module, there is the primary module interface which defines everything that is exported by a module. Building that module may provoke the compilation of other modules, but we know which file has to be finished building before import MyModule can work.
Now sticking everything a module can export into just one file is not the best idea. So in many cases, you'll have multiple files that export stuff, and you'll export import them in your main MyModule primary interface file. But since module names are global, we don't want dozens of tiny module names cluttering up the namespace.
Enter module partitions: These are module interface files whose names are namespaced within a specific module. Module interface files can include other partitions, but only those within the same module. And obviously, the graph of partition inclusion must be acyclic.
But that leaves us with a small problem. Let's say you have a partition that defines a class that gets exported to the primary module interface. But you don't want to put the implementation of those member functions in that partition file. So... where does it go?
I mean, you could put it in another partition that doesn't get imported by anybody. But if that partition doesn't get imported... why bother giving it a partition name? It'd be best if you can communicate immediately that this "partition" cannot be included.
Enter module implementation units. They are part of a specific module, and therefore they can import partitions of that module. But they cannot themselves be imported by anyone.
That's what they're for.
But note that the build system knows that it doesn't need to build module implementation units in order to completely build the module. It only needs to build the primary module interface file and any partitions included by it (directly or indirectly). This allows module rebuilding to be as fast as possible if you put your implementations into implementation units.
Lastly, module implementation units (and interface units) have access to any names that they import from a partition which the partition does not export. These module-local names are only accessible within the module.

What are the trade-offs between "separating module interface/implementation unit in a different source file" and using "private module fragment"

On the surface, using private module fragment to separate interface and implementation looks superior over separating module interface and implementation unit in a different source file since you only have to manage single source file when using private module fragment.
But is using private module fragment simply better than separating module interface/implementation unit in a different source file as it looks? What are the trade-offs?
The tradeoff is pretty obvious based on the limitations. A private module fragment (PMF) can only appear in a primary module interface unit, and there can be no other module units that contribute to a module that has a PMF. This means that the primary downside of using a PMF is that you're restricted to putting everything for a module in a single file.
The performance of module importing, broadly speaking, is not based on how much stuff is in a module. As such, putting a lot of stuff into a module is a pretty good idea. But putting all of those things in a single file can be long-winded and difficult to maintain. As libraries get bigger, splitting them into multiple files is usually better for organization.
The PMF construct is mainly for making it easy to distribute a library that provides module and non-module builds. The main code lives in (mostly) regular headers and source files, which are used to build the non-module version. For the module version, you have a single module unit. In the global fragment, you #include any headers that your interface headers use that are not part of your library (standard library headers, dependent libraries, etc). In that module unit's purview, you #include all of the interface headers within a big export{} block. The private module fragment can #include all of the .cpp files used to compile the library, so that the compiler can just build that one module file and get everything.

Tool/Way to indentify dependencies between a C-File and a included Header-File

I am working on a project (with alot of legacy code), which I have to analyze. I divided the source-files into modules and now I want to identify the dependencies. For some modules I want to know where exactly the interface between itself and another module is. So for example if I have
Module_A.c which includes Module_B.h I would ilke to know, which variables or functions Module_A is using (from Module_B).
So is there a way (or a tool), which is capable of telling me which functions from Module_B are called in Module_A (and/or which variables are referenced).
Note: I do not want some kind of "over-all" callgraph or a list with all references - I want only explicitly the references between two specific modules!
You can use tool "Source Insight" for this task. It can provide you call map, reference and much more.

Linux C++ Project Source File Directory Structure

I'm working on a fairly large C++ project on Linux. We are trying to come up with criteria for organizing our source file directory structure.
One thought we have is to have the directory structure reflect our architecture choices. For instance, we would have one root level for our domain classes and another for our boundary classes, and one for our domain-agnostic infrastructure classes.
So in a banking application, we might have a directory called src/domain/accounts, src/domain/customerTransactions, src/boundary/customerInputViews, etc. We might then have another directory called src/infra/collections, src/infra/threading, etc.
Also, within that structure, we'd isolate interface classes from implementation classes. We'd do that so clients of interfaces would not be dependent on the directory structure of the implementation classes.
Any thoughts?
Breaking code into independent parts sounds like a good idea. That would allow you to potentially break stuff into separate units (for autotools: you could have convenience libs for organization, and later even separate them complete into shared libs).
Of course the submodules should contain everything needed to build: headers, sources and build infrastructure (maybe only missing a top-level build definition file which gets included). This will make sure that work can be done on small units (but test the whole thing).