Header only library as a module? - c++

I'm authoring a templated header only library. It has no state, no global variables, no .cpp that needs to be compiled.
Is it possible to export/consume this as a module? How? What are the benefits? What are the pitfalls?
There are some convenience macros that I probably want the user to have. What about those?
I have found an example using #ifdef ... to cater for both module and old-school cases. I think I want to avoid that.

I have found an example using #ifdef ... to cater for both module and old-school cases. I think I want to avoid that.
Broadly speaking, that is not the best way to do this. You can do it by just importing the header as a header unit with import <header_file.hpp>; but you'll be losing some important aspects of modules. You're going to need to do a limited amount of #defineing if you want your module version to work well.
Just about every header-only library will have some declarations that are considered part of the library, but it will also have some which are not. These are usually put into a detail namespace or other attempts are made to hide them from users.
Modules have a mechanism for doing that: export. Or more to the point, they have a way to not put something in your interface: don't export it. But this requires explicitly tagging your interfaces with the export keyword... which won't work in non-module builds. So you need a macro to say if the build is a module build or not, so that you can have a #define that resolves to export or to nothing.
You could do an export namespace my_namespace {}; to try to export everything, but that can have... unpleasant side effects.
You also may need to explicitly inline certain class members of exported classes, as modules do not implicitly inline defined members of non-template classes the way a non-module build does. Fortunately, adding inline will work fine on non-module builds.
Writing a header-only library and a module interface file for it, such that a user can include whichever they prefer, isn't too difficult. But it does require some care, and to best take advantage of module features, you should use some macros.
The primary module interface unit would look like this:
module; //Begin global module fragment.
<external header includes>
export module MyModuleName; //Begin the actual module purview
#define MY_MODULE_NAME_EXPORT export;
#include "my_header1.hpp"
#include "my_header2.hpp"
The <external header includes> is very important. You need to #include every file that your library explicitly includes. If you include parts of the C++ standard library, then include them here.
The point of this is to stop those headers from shoving the contents of those headers into the module's purview. You want your inclusion of your library to be the only code that gets shoved into the module. So you include those headers in the global module fragment, and use their include-guards to prevent later inclusion.
And yes, this does mean you have to keep two separate lists of these headers. You can write a tool to look through all of your headers and build that list for you, but one way or another it's still a thing you need to have.
At the top of your library headers, after any include guards, you need to have this:
#ifndef MY_MODULE_NAME_EXPORT
#define MY_MODULE_NAME_EXPORT
#endif
This allows you to decorate anything you want to export with MY_MODULE_NAME_EXPORT:
MY_MODULE_NAME_EXPORT void some_function()
{
...
}
If you're in a module build, that will export the function. If you aren't, then it won't.

Related

When using C++ modules, is the any reason to separate function declarations (.hpp files) from their definitions (.cpp files)?

I'm used to writing code without modules where the header files contain the function declarations like:
// foo.h
class Foo
{
void bar();
};
and the corresponding .cpp file contains the definition:
// foo.cpp
#include "foo.h"
void Foo::bar()
{
// ...
}
To my knowledge, this is done to decrease compile time and reduce dependencies. When modules will be used, will this still apply? Would it be just as fast to have the class in a single file with the definitions the way Java and C# does it? If this is the case, will there be any need for both .hpp and .cpp files when using modules?
The only reason I'm aware of, as the modules proposal currently stands, is to handle circular interface dependencies.
If a program is made up of modules and it does not separate function declarations from definitions, all module files will be module interfaces (as opposed to module implementations). If you want to compare them to header and code files, module interfaces could be seen as the header (.hpp) file, and module implementations could be seen as the code (.cpp) files.
Unfortunately, the modules proposal does not allow cyclic module interface dependencies. And since your program is now completely made up of module interfaces, you will therefore never be able to have two modules which depend on each other in any way (this may be improved by the proclaimed ownership declaration in the future, but this is currently not supported). The only way to resolve circular module interface dependencies is by separating declarations and definitions and placing the circular imports in the module implementation files, as opposed to circular module interface dependencies, circular module implementation dependencies are allowed.
The following code provides an example of a situation that is impossible to compile without separating declarations and definitions:
Foo module file
export module Foo;
import module Bar;
export namespace Test {
class Foo {
public:
Bar createBar() {
return Bar();
}
};
}
Bar module file
export module Bar;
import module Foo;
export namespace Test {
class Bar {
public:
Foo createFoo() {
return Foo();
}
};
}
This article shows an example on how this could be solved were the proclaimed ownership declaration available. In essence, it comes down to separating declarations and definitions.
In a perfect world the compiler would be able to handle this scenario, but alas, as far as I'm aware the current proposed implementation of modules does not support it.
There are still lots of reasons to use header files.
The ease of sharing and understanding an object api without seeing the underlying details is of enough use to keep them around. It's a good 20 ft view of an object, essentially being an outline.
If you are selling a library, you would include a header file, as well as an archive file or a shared library. This way you can keep information proprietary without compromising the IP of your product, and your customers can include a binary compiled for their target.
I don't believe this is possible without header files.
There is a nice discussion here that explains the idea of modules.
In short you are right, the separation between header files and implementation files will no longer be needed. The #include directive will be replaced by the import directive, and at compile time the module will provide the required information that would otherwise be in the included header file.
Another use of this idiom is inherited from C; it is a convenient means of forward-declaration even for translation units that have no dependencies on other translation units.
The use of precompiled headers didn't really become an important thing until C++'s expanded use of header files made it necessary for performance reasons (notwithstanding some very large old school headers like windows.h).
The idea does seem to be to bring something more like the C#/Java mechanism into play. The C++ mechanism is very Modula/ADA in spirit. It would be nice to have the machines do more of the work for us.

Why is it not advised to define macros in header files?

The Google C++ Style Guide guide advises that macros must not be defined in a .h (header) file. What are the cons of doing it?
The preprocessor concatenates all included source files together in order. If you don't undefine a macro, it can apply to any source following where it was first defined.
Since headers are often the public API of a library, any macros you define in your headers could end up in someone else's code, doing unexpected things.
Since unexpected things are the antithesis of good software, you should either:
Not use macros (idiomatic C++ really shouldn't)
Define them in a private scope (always prefer private) or
Undefine them immediately after use (although this makes them largely useless for your own code)
The best solution depends on your use case. Include guards and other simple, safe defines are typically excluded ( function-like macros are more likely to cause problems, but you can still do something dumb like define TRUE FALSE).
You may also look into conditionally defining macros so they are present in your code but don't become part of the public API. Checking for a variable set during your build or keeping macros in a separate header allows others to optionally, explicitly, and knowingly include them, which can be convenient if the macros help avoid a lot of boilerplate.
For the same reasons that using statements should not be in header files: namespace pollution. If you want to use macros in a header file, make sure that you undefine them at the end of the header, this way they will not be included erroneously. If you simply want to define them in a header and use them in cpp files make sure that the "macros.h" is never included in any header.
The who point of this is that a end user of what ever public API you are developing may not want or expect, for example, sum(a, b) to expand to (a) + (b). Finding the source of one's own macro error can be a nightmare, finding someone else can be almost impossible.

Different C++ Class Declarations

I'm trying to use a third party C++ library that isn't using namespaces and is causing symbol conflicts. The conflicting symbols are for classes my code isn't utilizing, so I was considering creating custom header files for the third party library where the class declarations only include the public members my code is using, leaving out any members that use the conflicting classes. Basically creating an interface.
I have three questions:
If the compilation to .obj files works, will this technique still cause symbol conflicts when I get to linking?
If that isn't a problem, will the varying class declarations cause problems when linking? For example, does the linker verify that the declaration of a class used by each .obj file has the same number of members?
If neither of those are a problem and I'm able to link the .obj files, will it cause problems when invoking methods? I don't know exactly how C++ works under the hood, but if it uses indexes to point to class methods, and those indexes were different from one .obj file to another, I'm guessing this approach would blow up at runtime.
In theory, you need identical declarations for this to work.
In practice, you will definitely need to make sure your declarations contain:
All the methods you use
All the virtual methods, used or not.
All the data members
You need all these in the right order of declaration too.
You might get away with faking the data members, but would need to make sure you put in stubs that had the same size.
If you do not do all this, you will not get the same object layout and even if a link works it will fail badly and quickly at run-time.
If you do this, it still seems risky to me and as a worst case may appear to work but have odd run time failures.
"if it uses indexes ": To some extent exactly how virtual functions work is implementation defined, but typically it does use an index into a virtual function table.
What you might be able to do is to:
Take the original headers
Keep the full declarations for the classes you use
Stub out the classes and declarations you do not use but are referenced by the ones you do.
Remove all the types not referenced at all.
For explanatory purposes a simplified explaination follows.
c++ allows you to use functions you declare. what you do is putting multiple definitions to a single declaration across multiple translation units. if you expose the class declaration in a header file your compiler sees this in each translation unit, that includes the header file.
Therefore your own class functions have to be defined exactly as they have been declared (same function names same arguments).
if the function is not called you are allowed not to define it, because the compiler doesn't know whether it might be defined in another translation unit.
Compilation causes label creation for each defined function(symbol) in the object code. On the other hand a unresolved label is created for each symbol that is referenced to (a call site, a variable use).
So if you follow this rules you should get to the point where your code compiles but fails to link. The linker is the tool that maps defined symbols from each translation-unit to symbol references.
If the object files that are linked together have multiple definitions to the same functions the linker is unable to create an exact match and therefore fails to link.
In practice you most likely want to provide a library and enjoy using your own classes without bothering what your user might define. In spite of the programmer taking extra care to put things into a namespace two users might still choose the same name for a namespace. This will lead to link failures, because the compiler exposed the symbols and is supposed to link them.
gcc has added an attribute to explicitly mark symbols, that should not be exposed to the linker. (called attribute hidden (see this SO question))
This makes it possible to have multiple definitions of a class with the same name.
In order for this to work across compilation units, you have to make sure class declarations are not exposed in an interface header as it could cause multiple unmatching declarations.
I recommend using a wrapper to encapsulate the third party library.
Wrapper.h
#ifndef WRAPPER_H_
#define WRAPPER_H_
#include <memory>
class third_party;
class Wrapper
{
public:
void wrappedFunction();
Wrapper();
private:
// A better choice would be a unique_ptr but g++ and clang++ failed to
// compile due to "incomplete type" which is the whole point
std::shared_ptr<third_party> wrapped;
};
#endif
Wrapper.cpp
#include "Wrapper.h"
#include <third_party.h>
void Wrapper::wrappedFunction()
{
wrapped->command();
}
Wrapper::Wrapper():wrapped{std::make_shared<third_party>()}
{
}
The reason why a unique_ptr doesn't work is explained here: std::unique_ptr with an incomplete type won't compile
You can move the entire library into a namespace by using a clever trick to do with imports. All the import directive does is copy the relevant code into the current "translation unit" (a fancy name for the current code). You can take advantage of this as so
I've borrowed heavily from another answer by user JohnB which was later deleted by him.
// my_thirdparty.h
namespace ThirdParty {
#include "thirdparty.h"
//... Include all the headers here that you need to use for thirdparty.
}
// my_thirdparty.cpp / .cc
namespace ThirdParty {
#include "thirdparty.cpp"
//... Put all .cpp files in here that are currently in your project
}
Finally, remove all the .cpp files in the third party library from your project. Only compile my_thirdparty.cpp.
Warning: If you include many library files from the single my_thirdparty.cpp this might introduce compiler issues due to interaction between the individual .cpp files. Things such as include namespace or bad define / include directives can cause this. Either resolve or create multiple my_thirdparty.cpp files, splitting the library between them.

Custom headers higher than standard?

Is it reasonable to put custom headers higher in include section than standard headers?
For example include section in someclass.hpp:
#include "someclass.h"
#include "global.h"
#include <iostream>
#include <string>
Is it best practice? What is the profit if it is?
The reason is that if you forget to include a dependent header in someclass.h, then whatever implementation file includes it as the first header, will get a warning/error of undefined or undeclared type, and whatnot. If you include other headers first, then you could be masking that fact - supposing the included headers define the required types, functions, etc. Example:
my_type.h:
// Supressed include guards, etc
typedef float my_type;
someclass.h:
// Supressed include guards, etc
class SomeClass {
public:
my_type value;
};
someclass.cpp:
#include "my_type.h" // Contains definition for my_type.
#include "someclass.h" // Will compile because my_type is defined.
...
This will compile fine. But imagine you want to use use SomeClass in your program. If you don't include my_type.h before including someclass.h, you'll get a compiler error saying my_type is undefined. Example:
#include "someclass.h"
int main() {
SomeClass obj;
obj.value = 1.0;
}
It is fairly common practice to #include "widget.h" as the first thing in widget.cpp. What this does is ensure that widget.h is self-contained, i.e. does not inadvertently depend on other header files.
Beyond that, I think it's essentially a matter of personal preference.
There are two important observations to be made before delving in the specifics:
When you develop a new header/source pair, it is important to check that the header is self-contained. To do so, the easiest way is to include first in a file.
It is best not to include extraneous things before including a header you do not own, as this could create strange issues in case of conflict of macros or overload of functions.
Therefore, the answer depend if you have unit test or not.
A general rule of thumb is to include headers starting with the Standard Library, then 3rd party headers (including Open Source projects), then your own middleware, utilities, etc... and finally the headers local to this library. It more or less follows the order of dependencies to comply with observation 2.
The only exception I have seen was the one header corresponding to the current source file, which would be included first to make sure it is self-contained (observation 1)... but this only holds if you don't have unit tests, for if you do then the unit test source file is a very good place to check this.
While it is just personal choice, I would prefer to include standard headers first. Few reasons:
Any set of #ifdef..#define would be correctly mapped, rather than standard headers misinterpreting them. This goes for conditional compilation as well as values of some macros, while standard headers are being compiled.
Any change/new function in standard header may conflict with your function, and compiler would emit error in header file, which would be be complicated to solve.
All required standard headers should be placed in one header (preferbly some pre-compiled-header), include that header, and then include your custom header. This would reduce compilation time.
Start with the system headers.
If there are no dependencies between the headers both ways work, but since programming is essentially communication, not with the computer but with other humans, it is important to make it logical and easy to understand. And my opinion is that it is better to start with the system headers.
I base this one of my very first programming courses (in 1984, I think), where we programmed in Lisp and were taught to think like this: you start with the normal Lisp language, and then you create a new language that is more useful for your application by adding some functions and data types. If you for example add dates and the ability to manipulate dates, this new language could be called Lisp-with-dates. Then you could use Lisp-with-dates to create a new language with calendar functionality, which could be called Lisp-with-calendars. Like layers in an onion.
Similarly, you can view C as having a "core" language, without any headers, and then you can for example expand this language into a new, bigger language with I/O functionality by #including stdio.h. You add more and more stuff to the core language by #including more headers. (I am aware that the term "C language" in other contexts refers to the entire standard, with all the standard headers, but bear with me here.) Each new #included header creates a new, bigger language, and an additional layer of the onion.
Now, to me it seems that the standard headers obviously should be the inner part of this onion, and therefore before the custom headers. You can create the language C-with-monsters by adding stuff to C-with-I/O, but the people who created C-with-I/O did not start with C-with-monsters.
any place you include c++ compiler treats it as the same

Should I use a single header to include all static library headers?

I have a static library that I am building in C++. I have separated it into many header and source files. I am wondering if it's better to include all of the headers that a client of the library might need in one header file that they in turn can include in their source code or just have them include only the headers they need? Will that cause the code to be unecessary bloated? I wasn't sure if the classes or functions that don't get used will still be compiled into their products.
Thanks for any help.
Keep in mind that each source file that you compile involves an independent invocation of the compiler. With each invocation, the compiler has to read in every included header file, parse through it, and build up a symbol table.
When you use one of these "include the world" header files in lots of your source files, it can significantly impact your build time.
There are ways to mitigate this; for example, Microsoft has a precompiled header feature that essentially saves out the symbol table for subsequent compiles to use.
There is another consideration though. If I'm going to use your WhizzoString class, I shouldn't have to have headers installed for SOAP, OpenGL, and what have you. In fact, I'd rather that WhizzoString.h only include headers for the types and symbols that are part of the public interface (i.e., the stuff that I'm going to need as a user of your class).
As much as possible, you should try to shift includes from WhizzoString.h to WhizzoString.cpp:
OK:
// Only include the stuff needed for this class
#include "foo.h" // Foo class
#include "bar.h" // Bar class
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
BETTER:
// Only include the stuff needed by the users of this class
#include "foo.h" // Foo class
class Bar; // Forward declaration
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
If users of your class never have to create or use a Bar type, and the class doesn't contain any instances of Bar, then it may be sufficient to provide only a forward declaration of Bar in the header file (WhizzoString.cpp will have #include "bar.h"). This means that anyone including WhizzoString.h could avoid including Bar.h and everything that it includes.
In general, when linking the final executable, only the symbols and functions that are actually used by the program will be incorporated. You pay only for what you use. At least that's how the GCC toolchain appears to work for me. I can't speak for all toolchains.
If the client will always have to include the same set of header files, then it's okay to provide a "convience" header file that includes others. This is common practice in open-source libraries. If you decide to provide a convenience header, make it so that the client can also choose to include specifically what is needed.
To reduce compile times in large projects, it's common practice to include the least amount of headers as possible to make a unit compile.
what about giving both choices:
#include <library.hpp> // include everything
#include <library/module.hpp> // only single module
this way you do not have one huge include file, and for your separate files, they are stacked neatly in one directory
It depends on the library, and how you've structured it. Remember that header files for a library, and which pieces are in which header file, are essentially part of the API of the library. So, if you lead your clients to carefully pick and choose among your headers, then you will need to support that layout for a long time. It is fairly common for libraries to export their whole interface via one file, or just a few files, if some part of the API is truly optional and large.
A consideration should be compilation time: If the client has to include two dozen files to use your library, and those includes have internal includes, it can significantly increase compilation time in a big project, if used often. If you go this route, be sure all your includes have proper include guards around not only the file contents, but the including line as well. Though note: Modern GCC does a very good job of this particular issue and only requires the guards around the header's contents.
As to bloating the final compiled program, it depends on your tool chain, and how you compiled the library, not how the client of the library included header files. (With the caveat that if you declare static data objects in the headers, some systems will end up linking in the objects that define that data, even if the client doesn't use it.)
In summary, unless it is a very big library, or a very old and cranky tool chain, I'd tend to go with the single include. To me, freezing your current implementation's division into headers into the library's API is bigger worry than the others.
The problem with single file headers is explained in detail by Dr. Dobbs, an expert compiler writer. NEVER USE A SINGLE FILE HEADER!!! Each time a header is included in a .cc/.cpp file it has to be recompiled because you can feed the file macros to alter the compiled header. For this reason, a single header file will dramatically increase compile time without providing any benifit. With C++ you should optimize for human time first, and compile time is human time. You should never, because it dramatically increases compile time, include more than you need to compile in any header, each translation unit(TU) should have it's own implementation (.cc/.cpp) file, and each TU named with unique filenames;.
In my decade of C++ SDK development experience, I religiously ALWAYS have three files in EVERY module. I have a config.h that gets included into almost every header file that contains prereqs for the entire module such as platform-config and stdint.h stuff. I also have a global.h file that includes all of the header files in the module; this one is mostly for debugging (hint enumerate your seams in the global.h file for better tested and easier to debug code). The key missing piece here is that ou should really have a public.h file that includes ONLY your public API.
In libraries that are poorly programmed, such as boost and their hideous lower_snake_case class names, they use this half-baked worst practice of using a detail (sometimes named 'impl') folder design pattern to "conceal" their private interface. There is a long background behind why this is a worst practice, but the short story is that it creates an INSANE amount of redundant typing that turns one-liners into multi-liners, and it's not UML compliant and it messes up the UML dependency diagram resulting in overly complicated code and inconsistent design patterns such as children actually being parents and vice versa. You don't want or need a detail folder, you need to use a public.h header with a bunch of sibling modules WITHOUT ADDITIONAL NAMESPACES where your detail is a sibling and not a child that is in reatliy a parent. Namespaces are supposed to be for one thing and one thing only: to interface your code with other people's code, but if it's your code you control it and you should use unique class and funciton names because it's bad practice to use a namesapce when you don't need to because it may cause hash table collision that slow downt he compilation process. UML is the best pratice, so if you can organize your headers so they are UML compliant then your code is by definition more robust and portable. A public.h file is all you need to expose only the public API; thanks.