I am trying to search for ways to control the 'exposure' of functions/classes/variables to third-party users while I still have full access within a C++ project/library.
In javascript you got modules which does this exactually.
In java/C# you can get pretty far with access-modifiers.
But in C/C++ there doesn't seem to be any control beyond the file itself (i.e. when its in the .h/.hpp file, its accessible from anywhere).
So the question, is there a way to access functions/classes/variables from files within the project without exposing them to third-party users?
Well, don't put them on the API, and if these symbols aren't needed internally, keep them in header files only used for building your project, not installed as development headers for your consumers.
Say, you have a class declaration
class Whatpeopleuse {
private:
int _base;
public:
Whatpeopleuse(int base);
int nth_power(unsigned int exponent);
};
in your n247s/toolbox.hpp, which you install / ship to customers.
And the implementation in your mycode.cc file
#include "n247s/toolbox.hpp"
#include "math_functions.hpp" // contains declaration of power_function
Whatpeopleuse::Whatpeopleuse(int base) : _base(base)
{
}
int
Whatpeopleuse::nth_power(unsigned int exponent)
{
return power_function(_base, exponent)
}
with power_function defined in another file, math_functions.cc:
#include "math_functions.hpp"
int power_function(int base, unsigned int exponent)
{
int result = 1;
for(;exponent; --exponent)
result *= base;
return result;
}
Then you compile your mycode.cc and your math_functions.cc, link them together to a n247s.so (or .dll, or whatever your shared library extension is), and ship that to the customer together with toolbox.hpp.
Customer never sees the function definitions in math_functions.h, or the code in math_functions.cc or mycode.cc. That's internal to the binary you produced.
What the customer sees/gets is
the header toolbox.hpp, and what symbols / types there are in your library that they are able to access (otherwise, their compiler wouldn't know what there is to call in your library)
the binary n247s library, containing the symbols as declared in toolbox.hpp.
Of these symbols, only these that have visibility actually are then given a name associated with an address within the shared library file. You'll find that it's common to tell the linker that actually none of the functions in a header should be visible by default, and explicitly mark these classes and functions you want to see, using compiler __attribute__((visibility("default"))) (at least that's what's in my macros to do that, for MSVC, that attribute specification might look different).
The user of the class Whatpeopleuse can only access its public: members (There's ways around that, though, within limits), but they can see that you have private members (like _base). If that's too much disclosure, you can make your customer-facing classes only contain a single member pointer, something called a detail, and the customer-facing member functions (whose implementations just call detail->name_of_member).
I'd like to add that you don't want to make it hard for your customers to know what your class is doing. If they're so motivated, they can reverse engineer quite a lot. Making something that's just harder to read and understand because its headers go through lengths to obfuscate what's happening behind the scenes is frustrating and people won't like it.
On the other hand, the above methodology is something you typically find in large code bases – not to "hide" things from the user, but to keep the API clean – only the things that are in the public-facing header are part of the API, the rest is never seen by the user's compiler. That's great, because it means
you make clear what is a good idea to use, and what is internal "plumbing". This is mostly important because often, it's easy to forget what the functionality is that you actually want to offer, and then start writing confusing / hard to use libraries.
To minimize the ABI of your library: as long as the ABI of the symbols that you have in your public-facing libraries don't change, your user can just drop-in replace your library v1.1.2 with v1.1.3, by replacing the library binary, without recompilation.
It makes it clear what needs to be user-friendly documented, and what not. If you're shipping someone a library without documentation, they will go through hell to not use your library. I've myself have been in that position. I know companies who just rewrote whole driver suites from scratch because the documentation they got from the hardware vendor was not explaining behavior.
Related
I have trouble understanding the sentence with respect to inline and customers binary compatibility. Can someone please explain?
C++ FAQ Cline, Lomow:
When the compiler synthesizes the copy constructor, it makes them inline. If your classes are exposed to your customers ( for example, if your customers #include your header files rather than merely using an executable, built from your classes ), your inline code is copied into your customers executables. If your customers want to maintain binary compatibilty between releases of your header files, you must not change an inline functions that are visible to the customers. Because of this, you will want an explicit, non inline version of the copy constructor, that will be used directly by the customer.
Binary compatibility for dynamic libraries (.dll, .so) is often an important thing.
e.g. you don't want to have to recompile half the software on the OS because you updated some low level library everything uses in an incompatible way (and consider how frequent security updates can be). Often you may not even have all the source code required to do so even if you wanted.
For updates to your dynamic library to be compatible, and actually have an effect, you essentially can not change anything in a public header file, because everything there was compiled into those other binaries directly (even in C code, this can often include struct sizes and member layouts, and obviously you cant remove or change any function declarations either).
In addition to the C issues, C++ introduces many more (order of virtual functions, how inheritance works, etc.) so it is conceivable that you might do something that changes the auto generated C++ constructor, copy, destructor etc. while otherwise maintaining compatibility. If they are defined "inline" along with the class/struct, rather than explicitly in your source, then they will have been included directly by other applications/libraries that linked your dynamic library and used those auto generated functions, and they wont get your changed version (which you maybe didn't even realise has changed!).
It is referring to problems that can occur between binary releases of a library and header changes in that library. There are certain changes which are binary compatible and certain changes which are not. Changes to inline functions, such as an inlined copy-constructor, are not binary compatible and require that the consumer code be recompiled.
You see this within a single project all the time. If you change a.cpp then you don't have to recompile all of the files which include a.hpp. But if you change the interface in the header, then any consumer of that header typically needs to be recompiled. This is similar to the case when using shared libraries.
Maintaining binary compatibility is useful for when one wants to change the implementation of a binary library without changing its interface. This is useful for things like bug fixes.
For example say a program uses liba as a shared library. If liba contains a bug in a method for a class it exposes, then it can change the internal implementation and recompile the shared library and the program can use the new binary release of that liba without itself being recompiled. If, however, liba changes the public contract such as the implementation of an inlined method, or moving an inlined method to being externally declared, then it breaks the application binary interface (ABI) and the consume program must be recompiled to use the new binary version of the liba.
Consider the following code compiled into static library:
// lib.hpp
class
t_Something
{
private: ::std::string foo;
public: void
Do_SomethingUseful(void);
};
// lib.cpp
void t_Something::
Do_SomethingUseful(void)
{
....
}
// user_project.cpp
int
main()
{
t_Something something;
something.Do_SomethingUseful();
t_Something something_else = something;
}
Now when t_Something class fields changes somehow, for example a new one is added we get in a situation where all the user code have to be recompiled. Basically constructors implicitly generated by compiler "leaked" from our static library to user code.
I think I understand what this passage means. By no means I am endorsing this, though.
I believe, they describe the scenario when you are developing a library and provide it to your customers in a form of header files and pre-compiled binary library part. After customer has done initial build, they are expected to be able to substitute a binary part with a newer one without recompiling their application - only relink would be required. The only way to achieve that would be to guarantee that header files are immutable, i.e. not changed between versions.
I guess, the notion of this would come from the fact that in 98 build systems were not smart enough and would not be able to detect change in header file and trigger recompilation of affected source file.
Any and all of that is completely moot nowadays, and in fact, goes again the grain - as significant number of libraries actually try hard to be header-only libraries, for multiple reasons.
As I read the problem statement of item31: Minimize compilation dependencies between files of Effective C++, the following statement puzzles me:
class Person {
public:
Person(const std::string& name, const Date& birthday,
const Address& addr);
std::string name() const;
std::string birthDate() const;
std::string address() const;
...
private:
std::string theName; // implementation detail
Date theBirthDate; // implementation detail
Address theAddress; // implementation detail
};
in the file defining the Person class, you are likely to find something like this:
#include < string>
#include "date.h"
#include "address.h"
Unfortunately, this sets up a compilation dependency between the file defining Person and these header files. If any of these header files (comment mine: the headers listed above, namely < string>, "date.h", "address.h") is changed, or if any of the header files they depend on changes, the file containing the Person class must be recompiled, as must any files that use Person.
What I don't quite understand is the last part highlighted. Why do clients that use Person need recompilation? They just need to relink to the newly compiled Person object code, right (I am assuming the Person interface remains the same to its clients)?
If what clients really need - assuming the Person interface doesn't change - is just a relinking, does it still warrant the Pimpl idiom? The Pimpl class still need recompilation if any of the headers changes. The idiom only saves the client one relinking.
EDIT: It seems that there is a lot of confusion about what headers have changed. In this case, Scott Meyers was talking about the header files included by Person.h are changed. But Person.h itself does not change so clients using (#including) Person.h doesn't see a change (no timestamp change on Person.h). The makefile dependency would list Person.o as a prerequisite so the client will simply link with the new Person.o. I am learning Pimpl idiom, maybe I missed some obvious points in everyone's argument. Please elucidate.
EDIT2: When client needs to use Person, it includes Person.h which also include all the other included file such as date.h and address.h. I missed this part and thought only Person.cpp need to deal with these headers.
there is an intermediate step in compiling. i.e. if you compile foo.cpp and it includes a.h ab b.h then an intermediate source file
a.h content
b.h content
foo.cpp content
which is the input for compilation is created. note that if other headers are included in headers, they are also listed recursively.
since a chance in header cause your
compilation file, the intermediate file, change, foo.cpp should be recompiled.
Yes, but that re-linkage will fail if the datatype sizes are wrong, or the old code is trying to link to code that no longer exists. It's not magic: the code, at link-time, has still been compiled.
There is a subset of interface changes you can make without breaking binary compatibility; adding members to a type is not part of that subset.
(I am assuming the Person interface remains the same to its clients)
This is the key. Your assumption has removed the constraints, so the answer to "why do the other files need to be recompiled" becomes "they don't".
Obviously, the quote in its original context does not mention that assumption, which is why it is giving broader guidelines. Though, personally, I'd have liked to have seen a more in-depth explanation from Meyers of binary compatibility.
In a very practical sense: Suppose that person.h includes other files, or defines some preprocessor symbols. If you change its includes or change its preprocessors symbol, then any file that also includes person.h potentially has its meaning changed.
In practice the compiler will fully recompile any compilation units affected by the change, if I understand correctly. And even if there are some optimizations to avoid doing lots of work when only "minor" or "insignficant" changes take place, like adding whitespace or something, the compiler needs to at least look at any compilation unit whose text was potentially changed in order to be sure.
Generally speaking, most tool-chains don't cache the intermediate results of each compilation unit after preprocessor expansion, and even if you are using something like ccache it's not going to try to do anything intelligent with the cached stuff to avoid doing work when only small changes happen, it's only going to try to check if it's stale or not.
So, changing things in a header file that may seem even smaller than changing the layout or interface of a class, still needs to trigger recompilation generally. What if some of the compilation units contain queries like sizeof your class? Or use SFINAE tricks to detect if it has certain methods?
The core information in a header file describes interfaces.
An interface to a function describes its arguments (how many, what type, etc) and return type. The actual function implementation (definition) requires the function be called in an expected way - and the interface describes that. If code that is calling the function provides a different set of arguments, or if it acts as if the function returns something different than it actually does, then there will be a malfunction somewhere (either in the function, since it is not given information it expects, or in the caller, since the function doesn't give information the caller expects).
This means, if the interface to a function changes, then both code for the called function and for the callers need to be recompiled, in order to ensure consistency.
The same goes for type definitions. struct and class types may include member functions, and the compiler needs to ensure consistency between behaviour of those functions and their callers (or the programmer has to deal with inconsistency, which may manifest in tricky ways). Also, when creating an instance of a type (i.e. an object or a variable) the compiler needs to know the size of the type (how much memory it needs, how far the second element of an array is from the first, etc) in order to work with objects correctly.
All of this information is specified in the interface, which is typically placed in headers. Yes, the compiler might get away with making assumptions if it is not given information (e.g. in C, a function is assumed to return int and accept an arbitrary set of arguments if it is called without being previously declared) but there is still the problem of mismatches (e.g. if the function is assumed to return int, but actually returns a pointer of some type, what happens?).
More prosaically, build management processes (makefiles, build scripts, etc) typically check creation dates of files. For example, a source file may be recompiled if the corresponding object is older than that source file, or older than any of the header files that source file #includes. The logic of doing that is that the content of source file and its included headers affect how code in the compiled object behaves and, if the object file is older than one of those files, then there may well have been a change. The only way to make things line up is to recompile.
It would be possible to only recompile if a "substantive" change of file content has occurred (e.g. not recompile if only a comment has been changed in a header). However, doing that would mean it is necessary to reliably detect that the change in a file actually doesn't matter to the working of the program. The analysis to do that is certainly possible, but will often be more complicated - and time consuming, which is a problem as programmers tend to whine about long build times - than simply checking file dates.
I'm currently writing a C/C++ shared library which is planned to be an extension for another project. In the library, I need to call some functions and access some of the data structures of the original code. Clearly, the most obvious option would be to include the headers from the original code and let the user of the extension pass the pathes to the header files and build the library. In order to ease the build process, I thought about rewriting the required function declarations in a separate header file. Can this be considered good practice? What about libraries where the source code is not distributed? I would assume they use the same approach. Any help is appreciated!
Can this be considered good practice?
No. Shipping your own headers means you get no warning when the headers no longer match up with the library1. Structure types may get additional members, functions may change e.g. taking long instead of int like they used to, little things like that, that shouldn't affect users that use the provided headers, but will badly affect users that write their own.
The only time it makes sense is if the library promises ABI stability, that already-compiled third-party projects linked against an older version of the library will continue working. That's the exception, though, not the norm.
What about libraries where the source code is not distributed? I would assume they use the same approach.
If A links against B, and A is closed source, then A may still be recompiled by A's author against all versions of B.
If A links against B, and B is closed source, B still typically ships headers to allow users to make use of it.
If A links against B, and B is closed source, and doesn't ship headers, then typically, it is not designed to be linked against, and doing so anyway is a very bad idea. In a few rare scenarios, however, it does make sense, and shipping custom-written headers for B along with A may be a good idea.
1 When I write "library", I'm referring to the product associated with the headers. In the case of a plug-in, it's possible that the product associated with the headers would typically not be called a library, but the code using those headers would be.
You could use callbacks to separate main program from library.
For example, library which can calculate something. It could be data from
any source, but here it is read from file:
library.h
struct FooCalc_S;
typedef struct FooCalc_S FooCalc_T;
typedef int (*Callback_T)(void * extra);
FooCalc_T * FooCalc_Create(Callback_T callback, void * extra);
int FooCalc_Run(FooCalc_T * fc); // Calls callback multiple times
main.c
#include "library.h"
int ReadFromFile(void * extra) {
FILE * fp = extra;
// Reads next entry from file
return result;
}
int main(void) {
FILE * fp = // Open file here
FooCalc_T * fc = FooCalc_Create(ReadFromFile, fp);
int foo = FooCalc_Run(fc);
assume we were using gcc/g++ and a C API specified by a random committee. This specification defines the function
void foo(void);
Now, there are several implementations according to this specification. Let's pick two as a sample and call them nfoo and xfoo (provided by libnfoo and libxfoo as static and dynamic libraries respectively).
Now, we want to create a C++ framework for the foo-API. Thus, we specify an abstract class
class Foo
{
public:
virtual void foo(void) = 0;
};
and corresponding implementations
#include <nfoo.h>
#include "Foo.h"
class NFoo : public Foo
{
public:
virtual void foo(void)
{
::foo(); // calling foo from the nfoo C-API
}
};
as well as
#include <xfoo.h>
#include "Foo.h"
class XFoo : public Foo
{
public:
virtual void foo(void)
{
::foo(); // calling foo from the xfoo C-API
}
};
Now, we are facing a problem: How do we create (i.e. link) everything into one library?
I see that there will be a symbol clash with the foo function symbols of the C API implementations.
I already tried to split the C++ wrapper implementations into separate static libraries, but then I realized (again) that static libraries is just a collection of unlinked object files. So this will not work at all, unless there is a way to fully link the C libraries into the wrapper and remove/hide their symbols.
Suggestions are highly appreciated.
Update: Optimal solutions should support both implementations at the same time.
Note: The code is not meant to be functional. Perceive it as pseudo code.
Could you use dlopen/dlsym at runtime to resolve your foo call.
something like example code from link ( may not compile):
void *handle,*handle2;
void (*fnfoo)() = null;
void (*fxfoo)() = null;
/* open the needed object */
handle = dlopen("/usr/home/me/libnfoo.so", RTLD_LOCAL | RTLD_LAZY);
handle2 = dlopen("/usr/home/me/libxfoo.so", RTLD_LOCAL | RTLD_LAZY);
fnfoo = dlsym(handle, "foo");
fxfoo = dlsym(handle, "foo");
/* invoke function */
(*fnfoo)();
(*fxfoo)();
// don't forget dlclose()'s
otherwise, the symbols in the libraries would need to be modified.
this is not portable to windows.
First thing's first, if you are going to be wrapping up a C API in C++ code, you should hide that dependency behind a compilation firewall. This is to (1) avoid polluting the global namespace with the names from the C API, and (2) freeing the user-code from the dependency to the third-party headers. In this example, a rather trivial modification can be done to isolate the dependency to the C APIs. You should do this:
// In NFoo.h:
#include "Foo.h"
class NFoo : public Foo
{
public:
virtual void foo(void);
};
// In NFoo.cpp:
#include "NFoo.h"
#include <nfoo.h>
void NFoo::foo(void) {
::foo(); // calling foo from the nfoo C-API
}
The point of the above is that the C API header, <nfoo.h>, is only included in the cpp file, not in the header file. This means that user-code will not need to provide the C API headers in order to compile code that uses your library, nor will the global namespace names from the C API risk clashing with anything else being compiled. Also, if your C API (or any other external dependency for that matter) requires creating a number of things (e.g., handles, objects, etc.) when using the API, then you can also wrap them in a PImpl (pointer to a forward-declared implementation class that is only declared-defined in the cpp file) to achieve the same isolation of the external dependency (i.e., a "compilation firewall").
Now, that the basic stuff is out of the way, we can move to the issue at hand: simultaneously linking to two C APIs with name-clashing symbols. This is a problem and there is no easy way out. The compilation firewall technique above is really about isolating and minimizing dependencies during compilation, and by that, you could easily compile code that depends on two APIs with conflicting names (which isn't true in your version), however, you will still be hit hard with ODR (One Definition Rule) errors when reaching the linking phase.
This thread has a few useful tricks to resolving C API name conflicts. In summary, you have the following choices:
If you have access to static libraries (or object files) for at least one of the two C APIs, then you can use a utility like objcopy (in Unix/Linux) to add a prefix to all the symbols in that static library (object files), e.g., with the command objcopy --prefix-symbols=libn_ libn.o to prefix all the symbols in libn.o with libn_. Of course, this implies that you will need to add the same prefix to the declarations in the API's header file(s) (or make a reduced version with only what you need), but this is not a problem from a maintenance perspective as long as you have a proper compilation firewall in place for that external dependency.
If you don't have access to static libraries (or object files) or don't want to do this above (somewhat troublesome) approach, you will have to go with a dynamic library. However, this isn't as trivial as it sounds (and I'm not even gonna go into the topic of DLL Hell). You must use dynamic loading of the dynamic link library (or shared-object file), as opposed to the more usual static loading. That is, you must use the LoadLibrary / GetProcAddress / FreeLibrary (for Windows) and the dlopen / dlsym / dlclose (all Unix-like OSes). This means that you have to individually load and set the function-pointer address for each function that you wish to use. Again, if the dependencies are properly isolated in the code, this is going to be just a matter of writing all this repetitive code, but not much danger involved here.
If your uses of the C APIs is much simpler than the C APIs themselves (i.e., you use only a few functions out of hundreds of functions), it might be a lot easier for you to create two dynamic libraries, one for each C API, that exports only the limited subset of functions, giving them unique names, that wrap calls to the C API. Then, you main application or library can be link to those two dynamic libraries directly (statically loaded). Of course, if you need to do that for all the functions in that C API, then there is no point in going through all this trouble.
So, you can choose what seems more reasonable or feasible for you, there is no doubt that it will require quite a bit a manual work to fix this up.
if you only want to access one library implementation at a time, a natural way to go about it is as a dynamic library
in Windows that also works for accessing two or more library implementations at a time, because Windows dynamic libraries provide total encapsulation of whatever's inside
IIUC ifdef is what you need
put #define _NFOO
in the nfoo lib and #define XFOO in xfoo lib.
Also remember if nfoo lib and xfoo lib both have a function called Foo then there will be error during compilation. To avoid this GCC/G++ uses function overloading through name mangling.
you can then check if xfoo is linked using ifdefs
#ifdef XFOO
//call xfoo's foo()
#endif
A linker cannot distinguish between two different definitions of the same symbol name, so if you're trying to use two functions with the same name you'll have to separate them somehow.
The way to separate them is to put them in dynamic libraries. You can choose which things to export from a dynamic library, so you can export the wrappers while leaving the underlying API functions hidden. You can also load the dynamic library at runtime and bind to symbols one at a time, so even if the same name is define in more than one they won't interfere with each other.
Is there ever such a pattern of dependancies that it is impossible to keep everything in header files only? What if we enforced a rule of one class per header only?
For the purposes of this question, let's ignore static things :)
I am aware of no features in standard C++, excepting statics which you have already mentioned, which require a library to define a full translation unit (instead of only headers). However, it's not recommended to do that, because when you do, you force all your clients to recompile their entire codebase whenever your library changes. If you're using source files or a static library or a dynamic library form of distribution, your library can be changed/updated/modified without forcing everyone to recompile.
It is possible, I would say, at the express condition of not using a number of language features: as you noticed, a few uses of the static keyword.
It may require a few trick, but they can be reviewed.
You'll need to keep the header / source distinction whenever you need to break a dependency cycle, even though the two files will be header files in practice.
Free-functions (non-template) have to be declared inline, the compiler may not inline them, but if they are declared so it won't complained that they have been redefined when the client builts its library / executable.
Globally shared data (global variables and class static attributes) should be emulated using local static attribute in functions / class methods. In practice it matters little as far as the caller is concerned (just adds ()). Note that in C++0x this becomes the favored way because it's guaranteed to be thread-safe while still protecting from the initialization order fiasco, until then... it's not thread-safe ;)
Respecting those 3 points, I believe you would be able to write a fully-fledged header-only library (anyone sees something else I missed ?)
A number of Boost Libraries have used similar tricks to be header-only even though their code was not completely template. For example Asio does very consciously and proposes the alternative using flags (see release notes for Asio 1.4.6):
clients who only need a couple features need not worry about building / linking, they just grab what they need
clients who rely on it a bit more or want to cut down on compilation time are offered the ability to build their own Asio library (with their own sets of flags) and then include "lightweight" headers
This way (at the price of some more effort on the part of the library devs) the clients get their cake and eat it too. It's a pretty nice solution I think.
Note: I am wondering whether static functions could be inlined, I prefer to use anonymous namespaces myself so never really looked into it...
The one class per header rule is meaningless. If this doesn't work:
#include <header1>
#include <header2>
then some variation of this will:
#include <header1a>
#include <header2>
#include <header1b>
This might result in less than one class per header, but you can always use (void*) and casts and inline functions (in which case the 'inline' will likely be duly ignored by the compiler). So the question, seems to me, can be reduced to:
class A
{
// ...
void *pimpl;
}
Is it possible that the private implementation, pimpl, depends on the declaration of A? If so then pimpl.cpp (as a header) must both precede and follow A.h. But Since you can always, once again, use (void*) and casts and inline functions in preceding headers, it can be done.
Of course, I could be wrong. In either case: Ick.
In my long career, I haven't come across dependency pattern that would disallow header-only implementation.
Mind you that if you have circular dependencies between classes, you may need to resort to either abstract interface - concrete implementation paradigm, or use templates (using templates allows you to forward-reference properties/methods of template parameters, which are resolved later during instantiation).
This does not mean that you SHOULD always aim for header-only libraries. Good as they are, they should be reserved to template and inline code. They SHOULD NOT include substantial complex calculations.