Compile header-only template library into a shared library? - c++

We are in the process of designing a new C++ library and decided to go with a template-based approach along with some specific partial template specialisations for corner cases. In particular, this will be a header-only template library.
Now, there is some concern that this will lead to a lot of code duplication in the binaries, since this template 'library' will be compiled into any other shared library or executable that uses it (arguably only those parts that are used). I still think that this is not a problem (in particular, the compiler might even inline things which it could not across shared library boundaries).
However, since we know the finite set of types this is going to be used for, is there a way to compile this header into a library, and provide a different header with only the declarations and nothing else? Note that the library must contain not only the generic implementations but also the partial specialisations..

Yes. What you can do is explicitly instantiate the templates in CPP files using the compiler's explicit template instantiation syntax. Here is how to use explicit instantiation in VC++: http://msdn.microsoft.com/en-us/library/by56e477(v=VS.100).aspx. G++ has a similar feature: http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html#Template-Instantiation.
Note that C++11 introduced a standard syntax for explicit instantiation, described in [14.7.2] Explicit instantiation of the FDIS:
The syntax for explicit instantiation is:
explicit-instantiation:
externopt template declaration

C++ Shared Library with Templates: Undefined symbols error
Some answers there cover this topic. To sum up short: it is possible if you force to instantiate templates in shared library code explicitly. It will require explicit specification for all used types for all used templates on shared lib side, though.

If it is really templates-only, then there is no shared library. See various Boost projects for concrete examples. Only when you have non-template code will you have a library. A concrete example is eg Boost Date_Time and date formatting and parsing; you can use the library with or without that feature and hence with or without linking.
Not having a shared library is nice in the sense of having fewer dependencies. The downside is that your binaries may get a little bigger and that you have somewhat higher compile-time costs. But storage is fairly cheap (unless you work in embedded systems are other special circumstances) and compiling is usually a fixed one-time cost.

Although there isn't a standard way to do it, it is usually possible with implementation specific techniques. I did it a long time ago with Borland's C++ Builder. The idea is to declare your templates to be exported from the shared library where they need to reside and import them where they are used. The way I did it was along these lines:
// A.h
#ifdef GENERATE
# define DECL __declspec(dllexport)
#else
# define DECL __declspec(dllimport)
#endif
template <typename T> class DECL C {
};
// A.cpp
#define GENERATE
#include "A.h"
template class DECL A<int>;
Beware that I don't have access to the original code, so it may contain mistakes. This blog entry describes a very similar approach.
From your wording I suspect you're not on Windows, so you'll have to find out if and how this approach can be adopted with your compiler. I hope this is enough to put you in the right direction.

Related

Header-only linking

Many C++ projects (e.g. many Boost libraries) are "header-only linked".
Is this possible also in plain C? How to put the source code into headers? Are there any sites about it?
Executive summary: You can, but you shouldn't.
C and C++ code is preprocessed before it's compiled: all headers are "pasted" into the source files that include them, recursively. If you define a function in a header and it is included by two C files, you will end up with two copies in each object file (One Definition Rule violation).
You can create "header-only" C libraries if all your functions are marked as static, that is, not visible outside the translation unit. But it also means you will get a copy of all the static functions in each translation unit that includes the header file.
It is a bit different in C++: inline functions are not static, symbols emitted by the compiler are still visible by the linker, but the linker can discard duplicates, rather than giving up ("weak" symbols).
It's not idiomatic to write C code in the headers, unless it's based on macros (e.g. queue(3)). In C++, the main reason to keep code in the headers are templates, which may result in code instantiation for different template parameters, which is not applicable to C.
You do not link headers.
In C++ it's slightly easier to write code that's already better-off in headers than in separately-compiled modules because templates require it. 1
But you can also use the inline keyword for functions, which exists in C as well as C++. 2
// Won't cause redefinition link errors, because of 6.7.4/5
inline void foo(void) {
// ...
}
[c99: 6.7.4/5:] A function declared with an inline function
specifier is an inline function. The function specifier may appear
more than once; the behavior is the same as if it appeared only once.
Making a function an inline function suggests that calls to the
function be as fast as possible. The extent to which such
suggestions are effective is implementation-defined.
You're a bit stuck when it comes to data objects, though.
1 - Sort of.
2 - C99 for sure. C89/C90 I'd have to check.
Boost makes heavy use templates and template meta-programming which you cannot emulate (all that easily) in C.
But you can of course cheat by having declarations and code in C headers which you #include but that is not the same thing. I'd say "When in Rome..." and program C as per C conventions with libraries.
Yes, it is quite possible. Declare all functions in headers and either all as static or just use a single compilation unit (i.e. only a single c file) in your projects.
As a personal anecdote, I know quite a number of physicists who insist that this technique is the only true way to program C. It is beneficial because it's the poor man's version of -fwhole-program, i.e. makes optimizations based on the knowledge of function behaviour possible. It is practical because you don't need to learn about using the linker flags. It is a bad idea because your whole program must be compiled as a whole and recompiled with each minor change.
Personally, I'd recommend to let it be or at least go with static for only a few functions.

Header-only libraries and multiple definition errors

I want to write a library that to use, you only need to include one header file. However, if you have multiple source files and include the header in both, you'll get multiple definition errors, because the library is both declared and defined in the header. I have seen header-only libraries, in Boost I think. How did they do that?
Declare your functions inline, and put them in a namespace so you don't collide:
namespace fancy_schmancy
{
inline void my_fn()
{
// magic happens
}
};
The main reason why Boost is largely header-only is because it's heavily template oriented. Templates generally get a pass from the one definition rule. In fact to effectively use templates, you must have the definition visible in any translation unit that uses the template.
Another way around the one definition rule (ODR) is to use inline functions. Actually, getting a free-pass from the ODR is what inline really does - the fact that it might inline the function is really more of an optional side-effect.
A final option (but probably not as good) is to make your functions static. This may lead to code bloat if the linker isn't able to figure out that all those function instances are really the same. But I mention it for completeness. Note that compilers will often inline static functions even if they aren't marked as inline.
Boost uses header-only libraries a lot because like the STL, it's mostly built using class and function templates, which are almost always header-only.
If you are not writing templates I would avoid including code in your header files - it's more trouble than it's worth. Make this a plain old static library.
There are many truly header-only Boost libraries, but they tend to be very simple (and/or only templates). The bigger libraries accomplish the same effect through some trickery: they have "automatic linking" (you'll see this term used here). They essentially have a bunch of preprocessor directives in the headers that figure out the appropriate lib file for your platform and use a #pragma to instruct the linker to link it in. So you don't have to explicitly link it, but it is still being linked.

Is it ever impossible to write a header-only library?

Is there ever such a pattern of dependancies that it is impossible to keep everything in header files only? What if we enforced a rule of one class per header only?
For the purposes of this question, let's ignore static things :)
I am aware of no features in standard C++, excepting statics which you have already mentioned, which require a library to define a full translation unit (instead of only headers). However, it's not recommended to do that, because when you do, you force all your clients to recompile their entire codebase whenever your library changes. If you're using source files or a static library or a dynamic library form of distribution, your library can be changed/updated/modified without forcing everyone to recompile.
It is possible, I would say, at the express condition of not using a number of language features: as you noticed, a few uses of the static keyword.
It may require a few trick, but they can be reviewed.
You'll need to keep the header / source distinction whenever you need to break a dependency cycle, even though the two files will be header files in practice.
Free-functions (non-template) have to be declared inline, the compiler may not inline them, but if they are declared so it won't complained that they have been redefined when the client builts its library / executable.
Globally shared data (global variables and class static attributes) should be emulated using local static attribute in functions / class methods. In practice it matters little as far as the caller is concerned (just adds ()). Note that in C++0x this becomes the favored way because it's guaranteed to be thread-safe while still protecting from the initialization order fiasco, until then... it's not thread-safe ;)
Respecting those 3 points, I believe you would be able to write a fully-fledged header-only library (anyone sees something else I missed ?)
A number of Boost Libraries have used similar tricks to be header-only even though their code was not completely template. For example Asio does very consciously and proposes the alternative using flags (see release notes for Asio 1.4.6):
clients who only need a couple features need not worry about building / linking, they just grab what they need
clients who rely on it a bit more or want to cut down on compilation time are offered the ability to build their own Asio library (with their own sets of flags) and then include "lightweight" headers
This way (at the price of some more effort on the part of the library devs) the clients get their cake and eat it too. It's a pretty nice solution I think.
Note: I am wondering whether static functions could be inlined, I prefer to use anonymous namespaces myself so never really looked into it...
The one class per header rule is meaningless. If this doesn't work:
#include <header1>
#include <header2>
then some variation of this will:
#include <header1a>
#include <header2>
#include <header1b>
This might result in less than one class per header, but you can always use (void*) and casts and inline functions (in which case the 'inline' will likely be duly ignored by the compiler). So the question, seems to me, can be reduced to:
class A
{
// ...
void *pimpl;
}
Is it possible that the private implementation, pimpl, depends on the declaration of A? If so then pimpl.cpp (as a header) must both precede and follow A.h. But Since you can always, once again, use (void*) and casts and inline functions in preceding headers, it can be done.
Of course, I could be wrong. In either case: Ick.
In my long career, I haven't come across dependency pattern that would disallow header-only implementation.
Mind you that if you have circular dependencies between classes, you may need to resort to either abstract interface - concrete implementation paradigm, or use templates (using templates allows you to forward-reference properties/methods of template parameters, which are resolved later during instantiation).
This does not mean that you SHOULD always aim for header-only libraries. Good as they are, they should be reserved to template and inline code. They SHOULD NOT include substantial complex calculations.

write a C or C++ library with "template"

(1). When using C++ template, is it correct that the compiler (e.g. g++) will not compile the template definition (which can only be in header file not source file) directly, but generate the code based on template definition for each of its instantiations and then compile the generated code for its instantiations?
(2). If I want to write a C++ library which provide template classes and template functions, is it impossible to compile the library into shared file (.so, .a) because their instantiations will not be anywhere in the code of the library but only appear in the user's program? If yes, does it mean that template libraries are just source code files not precompiled files?
How is C++ standard template library (STL) implemented? Is its source code precompiled or compiled together with user's program?
(3). In C,
how to write a library that provide functions acting like template functions in C++? Is overloading a good solution?
If I have to write a procedure into a different function for different types of arguments, is there a good way for code reusing? Is this a good way to do it http://www.vlfeat.org/api/imop_8c_source.html? Any other ways?
Thanks and regards!
When using C++ template, is it correct that the compiler (e.g. g++)
will not compile the template
definition.
Yes. It's a correct assumption.
A template definition is incomplete code. You need to fill in the template parameters before compiling it.
If I want to write a C++ library which provide template classes and
template functions, is it impossible
to compile the library into shared
file (.so, .a)
No it's not possible. You can only compile individual instantiations of a template.
How is C++ standard template library
(STL) implemented? Is its source code
precompiled or compiled together with
user's program?
A large part of the STL code resides in header files and gets compiled together with your application.
In C, how to write a library that
provide functions acting like template
functions in C++? Is this a good way
to do it
http://www.vlfeat.org/api/imop_8c_source.html?
Any other ways?
Including the same file multiple times after redefining a macro (as demonstrated in the link you provided) is a good way to do this.
(3). In C, how to write a library that provide functions acting like template functions in C++? Is overloading a good solution?
If I have to write a procedure into a different function for different types of arguments, is there a good way for code reusing? Is this a good way to do it http://www.vlfeat.org/api/imop_8c_source.html? Any other ways?
When I need to write general purpose code I use void * as basic data type. This is good because it allows you to store both a generic pointer and a "primitive" value (like a int). Also recently I had to compile some code using this pattern in a 64 bit environment, and this taught me the importance of the stdint.h data types!
Speaking of acting like template in C, this is not a good idea. This is just my opinion, of course, but I think that the strong point of C is its simplicity, which is the reason why C is less error prone than C++.

Multiple definitions of a function template

Suppose a header file defines a function template. Now suppose two implementation files #include this header, and each of them has a call to the function template. In both implementation files the function template is instantiated with the same type.
// header.hh
template <typename T>
void f(const T& o)
{
// ...
}
// impl1.cc
#include "header.hh"
void fimpl1()
{
f(42);
}
// impl2.cc
#include "header.hh"
void fimpl2()
{
f(24);
}
One may expect the linker would complain about multiple definitions of f(). Specifically, if f() wouldn't be a template then that would indeed be the case.
How come the linker doesn't complain about multiple definitions of f()?
Is it specified in the standard that the linker must handle this situation gracefully? In other words, can I always count on programs similar to the above to compile and link?
If the linker can be clever enough to disambiguate a set of function template instantiations, why can't it do the same for regular functions, given they are identical as is the case for instantiated function templates?
The Gnu C++ compiler's manual has a good discussion of this. An excerpt:
C++ templates are the first language
feature to require more intelligence
from the environment than one usually
finds on a UNIX system. Somehow the
compiler and linker have to make sure
that each template instance occurs
exactly once in the executable if it
is needed, and not at all otherwise.
There are two basic approaches to this
problem, which are referred to as the
Borland model and the Cfront model.
Borland model
Borland C++ solved the template
instantiation problem by adding the
code equivalent of common blocks to
their linker; the compiler emits
template instances in each translation
unit that uses them, and the linker
collapses them together. The advantage
of this model is that the linker only
has to consider the object files
themselves; there is no external
complexity to worry about. This
disadvantage is that compilation time
is increased because the template code
is being compiled repeatedly. Code
written for this model tends to
include definitions of all templates
in the header file, since they must be
seen to be instantiated.
Cfront model
The AT&T C++ translator, Cfront,
solved the template instantiation
problem by creating the notion of a
template repository, an automatically
maintained place where template
instances are stored. A more modern
version of the repository works as
follows: As individual object files
are built, the compiler places any
template definitions and
instantiations encountered in the
repository. At link time, the link
wrapper adds in the objects in the
repository and compiles any needed
instances that were not previously
emitted. The advantages of this model
are more optimal compilation speed and
the ability to use the system linker;
to implement the Borland model a
compiler vendor also needs to replace
the linker. The disadvantages are
vastly increased complexity, and thus
potential for error; for some code
this can be just as transparent, but
in practice it can be very difficult
to build multiple programs in one
directory and one program in multiple
directories. Code written for this
model tends to separate definitions of
non-inline member templates into a
separate file, which should be
compiled separately.
When used with GNU ld version 2.8 or
later on an ELF system such as
GNU/Linux or Solaris 2, or on
Microsoft Windows, G++ supports the
Borland model. On other systems, G++
implements neither automatic model.
In order to support C++, the linker is smart enough to recognize that they are all the same function and throws out all but one.
EDIT: clarification:
The linker doesn't compare function contents and determine that they are the same.
Templated functions are marked as such and the linker recognizes that they have the same signatures.
This is more or less a special case just for templates.
The compiler only generates the template instantiations that are actually used. Since it has no control over what code will be generated from other source files, it has to generate the template code once for each file, to make sure that the method gets generated at all.
Since it's difficult to solve this (the standard has an extern keyword for templates, but g++ doesn't implement it) the linker simply accepts the multiple definitions.