How best to switch from template mess to clean classes architecture (C++)? - c++

Assuming a largish template library with around 100 files containing around 100 templates with overall more than 200,000 lines of code. Some of the templates use multiple inheritance to make the usage of the library itself rather simple (i.e. inherit from some base templates and only having to implement certain business rules).
All that exists (grown over several years), "works" and is used for projects.
However, compilation of projects using that library consumes a growing amount of time and it takes quite some time to locate the source for certain bugs. Fixing often causes unexpected side effects or is quite difficult, because some interdependent templates need changing. Testing is nearly impossible due to the sheer amount of functions.
Now, I would really like to simplify the architecture to use less templates and more specialized smaller classes.
Is there any proven way to go about that task? What would be a good place to start?

I'm not sure I see how/why templates are the problem, and why plain non-templated classes would be an improvement. Wouldn't that just mean even more classes, less type safety and so larger potential for bugs?
I can understand simplifying the architecture, refactoring and removing dependencies between the various classes and templates, but automatically assuming that "fewer templates will make the architecture better" is flawed imo.
I'd say that templates potentially allow you to build a much cleaner architecture than you'd get without them. Simply because you can make separate classes totally independent. Without templates, classes functions which call into another class must know about the class, or an interface it inherits, in advance. With templates, this coupling isn't necessary.
Removing templates would only lead to more dependencies, not fewer.
The added type-safety of templates can be used to detect a lot of bugs at compile-time (Sprinkle your code liberally with static_assert's for this purpose)
Of course, the added compile-time may be a valid reason to avoid templates in some cases, and if you only have a bunch of Java programmers, who are used to thinking in "traditional" OOP terms, templates might confuse them, which can be another valid reason to avoid templates.
But from an architecture point of view, I think avoiding templates is a step in the wrong direction.
Refactor the application, sure, it sounds like that's needed. But don't throw away one of the most useful tools for producing extensible and robust code just because the original version of the app misused it. Especially if you're already concerned with the amount of code, removing templates will most likely lead to more lines of code.

You need automated tests, that way in ten years time when your succesor has the same problem he can refactor the code (probably to add more templates because he thinks it will simplify usage of the library) and know it still meets all test cases. Similarly the side effects of any minor bug fixes will be immediately visible (assuming your test cases are good).
Other than that, "divide and conqueor"

Write unit tests.
Where the new code must do the same as the old code.
That's one tip at least.
Edit:
If you deprecate old code that you have replaced with the new functionality you
can phase over to the new code little by little.

Well, the problem is that template way of thinking is very different from object-oriented inheritance-based way. It's hard to answer anything else than "redesign the whole thing and start from scratch".
Of course, there may be a simple way for a particular case. We can't tell without knowing more about what you have.
The fact that the template solution is so difficult to maintain is an indication of a poor design anyway.

Some points (but note: these are not evil indeed. If you want to change to non-template code, though, this can help out):
Lookup your static interfaces. Where do templates depend on what functions exist? Where do they need typedefs?
Put the common parts in an abstract base class. A good example is when you happen to stumble over the CRTP idiom. You can just replace it with an abstract base class having virtual functions.
Lookup integer lists. If you find your code uses integral lists like list<1, 3, 3, 1, 3>, you can replace them with std::vector, if all the codes using them can live with working with runtime values instead of constant expressions.
Lookup type traits. There is much code involved checking whether some typedef exists, or whether some method exists in typical templated code. Abstract baseclasses solve these two issues by using pure virtual methods, and by inheriting typedefs to the base. Often, typedefs are only needed to trigger hideous features like SFINAE, which would then be superfluous too.
Lookup expression templates. If your code uses expression templates to avoid creating temporaries, you will have to eliminate them and use the traditional way of returning / passing temporaries to the operators involved.
Lookup function objects. If you find your code uses function objects, you can change them to use abstract base classes too, and have something like void run(); to call them (or if you want to keep using operator(), better so! It can be virtual too).

As I understand, you are most concerned with build times, and the maintainability of your library?
First, don't try to "fix" all at once.
Second, understand what you fix. Template complexity is there often for a reason, e.g. to enforce certain use, and make the compiler help you not make a mistake. That reason might sometimes be taken to far, but throwing out 100 lines because "noone really knows what they do" shouldn't be taken lightly. Everything I suggest here can introduce really nasty bugs, you have been warned.
Third, consider cheaper fixes first: e.g. faster machines or distributed build tools. At least, throw in all the RAM the boards will take, and throw out old disks. It does maike a difference. One drive for OS, one drive for build is a cheap mans RAID.
Is the library well documented? That's your best chance at making it Look into tools such as doxygen that help you create such a documentation.
All considered? OK, now some suggestions for the build times ;)
Understand the C++ build model: every .cpp is compiled individually. That means many .cpp files with many headers = huge build. This is NOT an advise to put everything into one .cpp file, though! However, one trick (!) that can speed up a build immensely is to create a single .cpp file that includes a bunch of .cpp files, and only feed that "master" file to the compiler. You can't do that blindly, though - you need to understand the types of errors this could introduce.
If you don't have one yet, get a separate build machine that you can remote into. You'll have to do a lot of almost-full builds to check if you broke some include. You will want to run this in another machine, that doesn't block you from working on something else. Long term, you'll need it for daily integration builds anyway ;)
Use precompiled headers. (scales better with fast machines, see above)
Check your header inclusion policy. While every file should be "independent" (i.e. include everything it needs to be included by someone else), don't include liberally. Unfortunately, I haven't yet found a tool to find unnecessary #incldue statements, but it might help to spend some time removing unused headers in "hotspot" files.
Create and use forward declarations for the templates you use. Often, you can incldue a header with forwad declarations in many places, and use the full header only in a few specific ones. This can greatly help compile time. Check the <iosfwd> header how the standard library does that for i/o streams.
overloads for templates for few types: If you have a complex function template that is useful only for a very few types like this:
// .h
template <typename FLOAT> // float or double only
FLOAT CalcIt(int len, FLOAT * values) { ... }
You can declare the overloads in the header, and move the template to the body:
// .h
float CalcIt(int len, float * values);
double CalcIt(int len, double * values);
// .cpp
template <typename FLOAT> // float or double only
FLOAT CalcItT(int len, FLOAT * values) { ... }
float CalcIt(int len, float * values) { return CalcItT(len, values); }
double CalcIt(int len, double * values) { return CalcItT(len, values); }
this moves the lengthy template to a single compilation unit.
Unfortunately, this is only of limited use for classes.
Check if the PIMPL idiom can move code from the headers into .cpp files.
The general rule that hides behind that is separate the interface of your library from the implementation. Use comments, detail namesapces and separate .impl.h headers to mentally and physically isolate what should be known to the outside from how it is accomplished. This exposes the real value of your library (does it actually encapsulate complexity?), and gives you a chance to replace "easy targets" first.
More specific advise - and how useful the one given is - depends largely on the actual library.
Good luck!

As mentioned, unit tests are a good idea. Indeed, rather than breaking your code by introducing "simple" changes that are likely to ripple out, just focus on creating a suite of tests, and fixing non-compliance with the tests. Have an activity to update the tests when bugs come to light.
Beyond that, I would suggest upgrading your tools, if possible, to help with debugging template-related problems.

I've often come across legacy templates that were huge and required a lot of time and memory to instantiate, but didn't need to be. In those cases, the easiest way to cut out the fat was to take all of the code that didn't rely on any of the template arguments and hide it in separate functions defined in a normal translation unit. This also had the positive side-effect of triggering fewer recompiles when this code had to be slightly modified or documentation changed. It sounds rather obvious, but it's really surprising how often people write a class template and think that EVERYTHING it does has to be defined in the header, rather than just the code that needs the templated information.
Another thing you might want to consider is how often you clean up the inheritance hierarchies by making the templates "mixin" style instead of aggregations of multiple inheritance. See how many places you can get away with making one of the template arguments the name of the base class that it should derive from (the way boost::enable_shared_from_this works). Of course this typically only works well if the constructors take no arguments, as you don't have to worry about initializing anything correctly.

Related

Impact of extra unused code from overloaded or templated function?

I am writing an interface class for point cloud registration using the PCL library. This means that I need to use its classes which are for the most part templated. However, I will not know the type of data the user wants to use before run-time. I'm ok with having to store a couple possibly null pointers to data and objects that I might not need to use because they are too few to have a meaningful impact on memory usage and there will only be one object of my class.
However, I will also have to duplicate some of my code one way or another, because it is going to use the underlying templated code of PCL. For example I might need the following
template<typename PointT>
process_cloud(pcl::PointCloud<PointT> &input_cloud);
I'm going to need 3-4 instantiations of this function (and a couple others) to be able to handle types unkown until run-time. However, I'm going to end up only using one of them. If these functions are non-trivial in size, what sort of impact can I expect on performance?
If it is non-negligible, how can I aleviate it? I tried to figure out ways that don't need duplicate code but I can't find a way to handle templated code polymorphically without writing templated code of my own.
If I have to make do with this design, is there any way to optimize the memory layout as to minimize the performance hit of cache misses? For example can I guarantee that my universally-needed functions will be close together and not watered-down by the potentially never called instantiations?
I thought about templating the whole class. This will make code more local because each isntantiation will group together the functions that will be called in tandem (same data type). It will also introduce more code bloat by creating copies of code that didn't need to be templated. To avoid this extra bloat, the best I can come up with is conceptually this:
template<typename PointT>
class Processor {
public:
process_cloud(pcl::PointCloud<PointT> &input_cloud);
...
}
class Interface {
public:
// ...
// bunch of common functions
// ...
// Instantiations I'm going to need. Pointers to save space.
// Could also be std::optional if pointers turn out to be unneeded
std::unique_ptr<Processor<pcl::PointXYZ>> p1;
...
}
This should produce a memory layout where the common functions are grouped together because they are defined in Interface. Every point type also has the functions used on it also grouped together because they are defined in separate classes. It's a little less readable, though. Any cleaner ways to help the compiler understand that template instantiations with the same argument are going to be used in tandem and should be local? Will it maybe realize and do it automatically?
You can experiment with it by putting an explicit instantiation in separate compilation units vs putting them all in the same compilation unit.
My guess is the difference won't be large as the only difference would be in ITLB misses.
Fewer in the separate case due to the code being more local.
You will get slightly more total code but if you only use one instance the rest should not influence the runtime as it never pollute the caches and might be swapped out at some time.
If the compiler and linker doesn't decide that they want to sort the functions after the 3rd letter in their mangled names or for some slightly better reason.

How to deal with the idea of "many small functions" for classes, without passing lots of parameters?

Over time I have come to appreciate the mindset of many small functions ,and I really do like it a lot, but I'm having a hard time losing my shyness to apply it to classes, especially ones with more than a handful of nonpublic member variables.
Every additional helper function clutters up the interface, since often the code is class specific and I can't just use some generic piece of code.
(To my limited knowledge, anyway, still a beginner, don't know every library out there, etc.)
So in extreme cases, I usually create a helper class which becomes the friend of the class that needs to be operated on, so it has access to all the nonpublic guts.
An alternative are free functions that need parameters, but even though premature optimization is evil, and I haven't actually profiled or disassembled it...
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference, even though that should be a simple address per argument.
Is all this a matter of preference, or is there a widely used way of dealing with that kind of stuff?
I know that trying to force stuff into patterns is a kind of anti pattern, but I am concerned about code sharing and standards, and I want to get stuff at least fairly non painful for other people to read.
So, how do you guys deal with that?
Edit:
Some examples that motivated me to ask this question:
About the free functions:
DeadMG was confused about making free functions work...without arguments.
My issue with those functions is that unlike member functions, free functions only know about data, if you give it to them, unless global variables and the like are used.
Sometimes, however, I have a huge, complicated procedure I want to break down for readability and understandings sake, but there are so many different variables which get used all over the place that passing all the data to free functions, which are agnostic to every bit of member data, looks simply nightmarish.
Click for an example
That is a snippet of a function that converts data into a format that my mesh class accepts.
It would take all of those parameter to refactor this into a "finalizeMesh" function, for example.
At this point it's a part of a huge computer mesh data function, and bits of dimension info and sizes and scaling info is used all over the place, interwoven.
That's what I mean with "free functions need too many parameters sometimes".
I think it shows bad style, and not necessarily a symptom of being irrational per se, I hope :P.
I'll try to clear things up more along the way, if necessary.
Every additional helper function clutters up the interface
A private helper function doesn't.
I usually create a helper class which becomes the friend of the class that needs to be operated on
Don't do this unless it's absolutely unavoidable. You might want to break up your class's data into smaller nested classes (or plain old structs), then pass those around between methods.
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference
That's not premature optimization, that's a perfectly acceptable way of preventing/reducing cognitive load. You don't want functions taking more than three parameters. If there are more then three, consider packaging your data in a struct or class.
I sometimes have the same problems as you have described: increasingly large classes that need too many helper functions to be accessed in a civilized manner.
When this occurs I try to seperate the class in multiple smaller classes if that is possible and convenient.
Scott Meyers states in Effective C++ that friend classes or functions is mostly not the best option, since the client code might do anything with the object.
Maybe you can try nested classes, that deal with the internals of your object. Another option are helper functions that use the public interface of your class and put the into a namespace related to your class.
Another way to keep your classes free of cruft is to use the pimpl idiom. Hide your private implementation behind a pointer to a class that actually implements whatever it is that you're doing, and then expose a limited subset of features to whoever is the consumer of your class.
// Your public API in foo.h (note: only foo.cpp should #include foo_impl.h)
class Foo {
public:
bool func(int i) { return impl_->func(i); }
private:
FooImpl* impl_;
};
There are many ways to implement this. The Boost pimpl template in the Vault is pretty good. Using smart pointers is another useful way of handling this, too.
http://www.boost.org/doc/libs/1_46_1/libs/smart_ptr/sp_techniques.html#pimpl
An alternative are free functions that
need parameters, but even though
premature optimization is evil, and I
haven't actually profiled or
disassembled it... I still DREAD the
mere thought of passing all the stuff
I need sometimes, even just as
reference, even though that should be
a simple address per argument.
So, let me get this entirely straight. You haven't profiled or disassembled. But somehow, you intend on ... making functions work ... without arguments? How, exactly, do you propose to program without using function arguments? Member functions are no more or less efficient than free functions.
More importantly, you come up with lots of logical reasons why you know you're wrong. I think the problem here is in your head, which possibly stems from you being completely irrational, and nothing that any answer from any of us can help you with.
Generic algorithms that take parameters are the basis of modern object orientated programming- that's the entire point of both templates and inheritance.

Header file best practices for typedefs

I'm using shared_ptr and STL extensively in a project, and this is leading to over-long, error-prone types like shared_ptr< vector< shared_ptr<const Foo> > > (I'm an ObjC programmer by preference, where long names are the norm, and still this is way too much.) It would be much clearer, I believe, to consistently call this FooListPtr and documenting the naming convention that "Ptr" means shared_ptr and "List" means vector of shared_ptr.
This is easy to typedef, but it's causing headaches with the headers. I seem to have several options of where to define FooListPtr:
Foo.h. That entwines all the headers and creates serious build problems, so it's a non-starter.
FooFwd.h ("forward header"). This is what Effective C++ suggests, based on iosfwd.h. It's very consistent, but the overhead of maintaining twice the number of headers seems annoying at best.
Common.h (put all of them together into one file). This kills reusability by entwining a lot of unrelated types. You now can't just pick up one object and move it to another project. That's a non-starter.
Some kind of fancy #define magic that typedef's if it hasn't already been typedefed. I have an abiding dislike for the preprocessor because I think it makes it hard for new people to grok the code, but maybe....
Use a vector subclass rather than a typedef. This seems dangerous...
Are there best practices here? How do they turn out in real code, when reusability, readability and consistency are paramount?
I've marked this community wiki if others want to add additional options for discussion.
I'm programming on a project which sounds like it uses the common.h method. It works very well for that project.
There is a file called ForwardsDecl.h which is in the pre-compiled header and simply forward-declares all the important classes and necessary typedefs. In this case unique_ptr is used instead of shared_ptr, but the usage should be similar. It looks like this:
// Forward declarations
class ObjectA;
class ObjectB;
class ObjectC;
// List typedefs
typedef std::vector<std::unique_ptr<ObjectA>> ObjectAList;
typedef std::vector<std::unique_ptr<ObjectB>> ObjectBList;
typedef std::vector<std::unique_ptr<ObjectC>> ObjectCList;
This code is accepted by Visual C++ 2010 even though the classes are only forward-declared (the full class definitions are not necessary so there's no need to include each class' header file). I don't know if that's standard and other compilers will require the full class definition, but it's useful that it doesn't: another class (ObjectD) can have an ObjectAList as a member, without needing to include ObjectA.h - this can really help reduce header file dependencies!
Maintenance is not particularly an issue, because the forwards declarations only need to be written once, and any subsequent changes only need to happen in the full declaration in the class' header file (and this will trigger fewer source files to be recompiled due to reduced dependencies).
Finally it appears this can be shared between projects (I haven't tried myself) because even if a project does not actually declare an ObjectA, it doesn't matter because it was only forwards declared and if you don't use it the compiler doesn't care. Therefore the file can contain the names of classes across all projects it's used in, and it doesn't matter if some are missing for a particular project. All that is required is the necessary full declaration header (e.g. ObjectA.h) is included in any source (.cpp) files that actually use them.
I would go with a combined approach of forward headers and a kind of common.h header that is specific to your project and just includes all the forward declaration headers and any other stuff that is common and lightweight.
You complain about the overhead of maintaining twice the number of headers but I don’t think this should be too much of a problem: the forward headers usually only need to know a very limited number of types (one?), and sometimes not even the full type.
You could even try auto-generating the headers using a script (this is done e.g. in SeqAn) if there are really that many headers.
+1 for documenting the typedef conventions.
Foo.h - can you detail the problems you have with that?
FooFwd.h - I'd not use them generally, only on "obvious hotspots". (Yes, "hotspots" are hard to determine).
It doesn't change the rules IMO because when you do introduce a fwd header, the associated typedefs from foo.h move there.
Common.h - cool for small projects, but doesn't scale, I do agree.
Some kind of fancy #define... PLEASE NO!...
Use a vector subclass - doesn't make it better.
You might use containment, though.
So here the prelimenary suggestions (revised from that other question..)
Standard type headers <boost/shared_ptr.hpp>, <vector> etc. can go into a precompiled header / shared include file for the project. This is not bad. (I personally still include them where needed, but that works in addition to putting them into the PCH.)
If the container is an implementation detail, the typedefs go where the container is declared (e.g. private class members if the container is a private class member)
Associated types (like FooListPtr) go to where Foo is declarated, if the associated type is the primary use of the type. That's almost always true for some types - e.g. shared_ptr.
If Foo gets a separate forward declaration header, and the associated type is ok with that, it moves to the FooFwd.h, too.
If the type is only associated with a particular interface (e.g. parameter for a public method), it goes there.
If the type is shared (and does not meet any of the previous criteria), it gets its own header. Note that this also means to pull in all dependencies.
It feels "obvious" for me, but I agree it's not good as a coding standard.
I'm using shared_ptr and STL extensively in a project, and this is leading to over-long, error-prone types like shared_ptr< vector< shared_ptr > > (I'm an ObjC programmer by preference, where long names are the norm, and still this is way too much.) It would be much clearer, I believe, to consistently call this FooListPtr and documenting the naming convention that "Ptr" means shared_ptr and "List" means vector of shared_ptr.
for starters, i recommend using good design structures for scoping (e.g., namespaces) as well as descriptive, non-abbreviated names for typedefs. FooListPtr is terribly short, imo. nobody wants to guess what an abbreviation means (or be surprised to find Foo is const, shared, etc.), and nobody wants to alter their code simply because of scope collisions.
it may also help to choose a prefix for typedefs in your libraries (as well as other common categories).
it's also a bad idea to drag types out of their declared scope:
namespace MON {
namespace Diddy {
class Foo;
} /* << Diddy */
/*...*/
typedef Diddy::Foo Diddy_Foo;
} /* << MON */
there are exceptions to this:
an entirely ecapsualted private type
a contained type within a new scope
while we're at it, using in namespace scopes and namespace aliases should be avoided - qualify the scope if you want to minimize future maintentance.
This is easy to typedef, but it's causing headaches with the headers. I seem to have several options of where to define FooListPtr:
Foo.h. That entwines all the headers and creates serious build problems, so it's a non-starter.
it may be an option for declarations which really depend on other declarations. implying that you need to divide packages, or there is a common, localized interface for subsystems.
FooFwd.h ("forward header"). This is what Effective C++ suggests, based on iosfwd.h. It's very consistent, but the overhead of maintaining twice the number of headers seems annoying at best.
don't worry about the maintenance of this, really. it is a good practice. the compiler uses forward declarations and typedefs with very little effort. it's not annoying because it helps reduce your dependencies, and helps ensure that they are all correct and visible. there really isn't more to maintain since the other files refer to the 'package types' header.
Common.h (put all of them together into one file). This kills reusability by entwining a lot of unrelated types. You now can't just pick up one object and move it to another project. That's a non-starter.
package based dependencies and inclusions are excellent (ideal, really) - do not rule this out. you'll obviously have to create package interfaces (or libraries) which are designed and structured well, and represent related classes of components. you're making an unnecessary issue out of object/component reuse. minimize the static data of a library, and let the link and strip phases do their jobs. again, keep your packages small and reusable and this will not be an issue (assuming your libraries/packages are well designed).
Some kind of fancy #define magic that typedef's if it hasn't already been typedefed. I have an abiding dislike for the preprocessor because I think it makes it hard for new people to grok the code, but maybe....
actually, you may declare a typedef in the same scope multiple times (e.g., in two separate headers) - that is not an error.
declaring a typedef in the same scope with different underlying types is an error. obviously. you must avoid this, and fortunately the compiler enforces that.
to avoid this, create a 'translation build' which includes the world - the compiler will flag declarations of typedeffed types which don't match.
trying to sneak by with minimal typedefs and/or forwards (which are close enough to free at compilation) is not worth the effort. sometimes you'll need a bunch of conditional support for forward declarations - once that is defined, it is easy (stl libraries are a good example of this -- in the event you are also forward declaring template<typename,typename>class vector;).
it's best to just have all these declarations visible to catch any errors immediately, and you can avoid the preprocessor in this case as a bonus.
Use a vector subclass rather than a typedef. This seems dangerous...
a subclass of std::vector is often flagged as a "beginner's mistake". this container was not meant to be subclassed. don't resort to bad practices simply to reduce your compile times/dependencies. if the dependency really is that significant, you should probably be using PIMPL, anyways:
// <package>.types.hpp
namespace MON {
class FooListPtr;
}
// FooListPtr.hpp
namespace MON {
class FooListPtr {
/* ... */
private:
shared_ptr< vector< shared_ptr<const Foo> > > d_data;
};
}
Are there best practices here? How do they turn out in real code, when reusability, readability and consistency are paramount?
ultimately, i've found a small concise package based approach the best for reuse, for reducing compile times, and minimizing dependence.
Unfortunately with typedefs you have to choose between not ideal options for your header files. There are special cases where option one (right in the class header) works well, but it sounds like it won't work for you. There are also cases where the last option works well, but it's usually where you are using the subclass to replace a pattern involving a class with a single member of type std::vector. For your situation, I'd use the forward declaring header solution. There's extra typing and overhead, but it wouldn't be C++ otherwise, right? It keeps things separate, clean and fast.

Is there a good way to avoid duplication of method prototypes in C++?

Most C++ class method signatures are duplicated between the declaration normally in a header files and the definition in the source files in the code I have read. I find this repetition undesirable and code written this way suffers from poor locality of reference. For instance, the methods in source files often reference instance variables declared in the header file; you end up having to constantly switch between header files and source files when reading code.
Would anyone recommend a way to avoid doing so? Or, am I mainly going to confuse experienced C++ programmers by not doing things in the usual way?
See also Question 538255 C++ code in header files where someone is told that everything should go in the header.
There is an alternative, but the cure is worse than the illness — define all the function bodies in the header, or even inline in the class, like C#. The downsides are that this will bloat compile times significantly, and it'll annoy veteran C++ programmers. It can also get you into some annoying situations of circular dependency that, while solvable, are a nuisance to deal with.
Personally, I just set my IDE to have a vertical split, and put the header file on the right side and the source file on the left.
I assume you're talking about member function declarations in a header file and definitions in source files?
If you're used to the Java/Python/etc. model, it may well seem redundant. In fact, if you were so inclined, you could define all functions inline in the class definition (in the header file). But, you'd definitely be breaking with convention and paying the price of additional coupling and compilation time every time you changed anything minor in the implementation.
C++, Ada, and other languages originally designed for large scale systems kept definitions hidden for a reason--there's no good reason that the users of a class should have to be concerned with its implementation, nor any reason they should have to repeatedly pay to compile it. Less of an issue nowadays with faster systems, but still relevant for really large systems. Additionally, TDD, stubbing and other testing strategies are facilitated by the isolation and quicker compilation.
Don't break with convention. In the end, you will make a ball of worms that doesn't work very well. Plus, compilers will hate you. C/C++ are setup that way for a reason.
C++ language supports function overloading, which means that the entire function signature is basically a way to identify a specific function. For this reason, as long as you declare and define function separately, there's really no redundancy in having to list the parameters again. More precisely, having to list the parameter types is not redundant. Parameters names, on the other hand, play no role in this process and you are free to omit them in the declaration (i.e in the header file), although I belive this limits readability.
You "can" get around the problem. You define an abstract interface class that only contains the pure virtual functions that an outside application will call. Then in the CPP file you provide the actual class that derives from the interface and contains all the class variables. You implement as normal now. The only thing this requires is a way to instantiate the derived implementation class from the interface class. You could do that by providing a static "Create" function that has its implementation in the CPP file.
ie
InterfaceClass* InterfaceClass::Create()
{
return new ImplementationClass;
}
This way you effectively hide the implementation from any outside user. You can't, however, create the class on the stack only on the heap ... but it does solve your problem AND provides a better layer of abstraction. In the end though if you aren't prepared to do this you need to stick with what you are doing.

The Pimpl Idiom in practice

There have been a few questions on SO about the pimpl idiom, but I'm more curious about how often it is leveraged in practice.
I understand there are some trade-offs between performance and encapsulation, plus some debugging annoyances due to the extra redirection.
With that, is this something that should be adopted on a per-class, or an all-or-nothing basis? Is this a best-practice or personal preference?
I realize that's somewhat subjective, so let me list my top priorities:
Code clarity
Code maintainability
Performance
I always assume that I will need to expose my code as a library at some point, so that's also a consideration.
EDIT: Any other options to accomplish the same thing would be welcome suggestions.
I'd say that whether you do it per-class or on an all-or-nothing basis depends on why you go for the pimpl idiom in the first place. My reasons, when building a library, have been one of the following:
Wanted to hide implementation in order to avoid disclosing information (yes, it was not a FOSS project :)
Wanted to hide implementation in order to make client code less dependent. If you build a shared library (DLL), you can change your pimpl class without even recompiling the application.
Wanted to reduce the time it takes to compile the classes using the library.
Wanted to fix a namespace clash (or similar).
None of these reasons prompts for the all-or-nothing approach. In the first one, you only pimplize what you want to hide, whereas in the second case it's probably enough to do so for classes which you expect to change. Also for the third and fourth reason there's only benefit from hiding non-trivial members that in turn require extra headers (e.g., of a third-party library, or even STL).
In any case, my point is that I wouldn't typically find something like this too useful:
class Point {
public:
Point(double x, double y);
Point(const Point& src);
~Point();
Point& operator= (const Point& rhs);
void setX(double x);
void setY(double y);
double getX() const;
double getY() const;
private:
class PointImpl;
PointImpl* pimpl;
}
In this kind of a case, the tradeoff starts to hit you because the pointer needs to be dereferenced, and the methods cannot be inlined. However, if you do it only for non-trivial classes then the slight overhead can typically be tolerated without any problems.
One of the biggest uses of pimpl ideom is the creation of stable C++ ABI. Almost every Qt class uses "D" pointer that is kind of pimpl. This allows performing much easier changes withot breaking ABI.
Code Clarity
Code clarity is very subjective, but in my opinion a header that has a single data-member is much more readable than a header with many data-members. The implementation file however is noisier, so clarity is reduced there. That might not be an issue if the class is a base class, mostly used by derived classes rather than maintained.
Maintainability
For maintainability of the pimpl'd class I personally find the extra dereference in each access of a data-member tedious. Accessors can't help if the data is purely private because then you shouldn't expose an accessor or mutator for it anyway, and you're stuck with constantly dereferencing the pimpl.
For maintainability of derived classes I find the idiom is a pure win in all cases, because the header file lists fewer irrelevant details. Compile time is also improved for all client compilation units.
Performance
Performance loss is small in many cases and significant in few. In the long-run it is in the order of magnitude of virtual functions' performance loss. We're talking about an extra dereference per access per data-member, plus dynamic memory allocation for the pimpl, plus release of the memory on destruction. If the pimpl'd class doesn't access its data-members often, the pimpl'd class' objects are created often and are short-lived then dynamic allocation can out-weigh the extra-dereferences.
Decision
I think classes in which performance is crucial, such that one extra dereference or memory allocation makes a significant difference, shouldn't use the pimpl no matter what. Base classe in which this reduction in performance is insignificant and of which the header file is widely #include'd probably should use the pimpl if compilation time is improved significantly. If compilation time isn't reduced it's down to your code-clarity taste.
For all other cases it's purely a matter of taste. Try it and measure runtime performance and compile-time performance before you make a decision.
pImpl is very useful when you come to implement std::swap and operator= with the strong exception guarantee. I'm inclined to say that if your class supports either of those, and has more than one non-trivial field, then it's usually no longer down to preference.
Otherwise, it's about how tightly you want clients to be bound to the implementation via the header file. If binary-incompatible changes aren't a problem, then you might not benefit much in maintainability, although if compile speed becomes an issue there are usually savings there.
The performance costs probably have more to do with loss of inlining than they do with indirection, but that's a wild guess.
You can always add pImpl later, and declare that from this day forth clients will not have to recompile just because you added a private field.
So none of this suggests an all-or-nothing approach. You can selectively do it for the classes where it gives you benefit, not for the ones it doesn't, and change your mind later. Implementing for example iterators as pImpl sounds like Too Much Design...
This idiom helps greatly with compile time on large projects.
External link
This is good too
I generally use it when I want to avoid a header file polluting my codebase. Windows.h is the perfect example. It is so badly behaved, I'd rather kill myself than have it visible everywhere. So assuming you want a class-based API, hiding it behind a pimpl class neatly solves the problem. (If you're content to just expose individual function, those can just be forward declared, of course, without putting them into a pimpl class)
I wouldn't use pimpl everywhere, partly because of the performance hit, and partly just because it's a lot of extra work for a usually small benefit. The main thing it gives you is isolation between implementation and interface. Usually, that's just not a very high priority.
I use the idiom in a couple of places in my own libraries, in both cases to cleanly split the interface from tthe implementation. I have, for example, an XML reader class fully declared in a .h file, which has a PIMPL to a RealXMLReader class which is declared & defined in non-public .h and .cpp files. The RealXMlReader in turn is a convenience wrapper for the XML parser I use (currently Expat).
This arrangement allows me to change from Expat in the future to another XML parser without having to recompile all the client code (I still need to re-link of course).
Note that I don't do this for compile-time performance reasons, only for conveniance. There are a few PIMPL fabnatics who insist that any project containing more than three files will be uncompilable unless you use PIMPLs throughout. It's noticeable that these people never produce any actual evidence, but only make vague references to "Latkos" and "exponential time".
pImpl will work best when we have r-value semantics.
The "alternative" to pImpl, that will also achieve hiding the implementation detail, is to use an abstract base class and put the implementation in a derived class. Users call some kind of "factory" method to create the instance and will generally use a pointer (probably a shared one) to the abstract class.
The rationale behind pImpl instead can be:
Saving on a v-table. Yes, but will your compiler inline all the forwarding and will you really save anything.
If your module contains multiple classes that know about each other in detail although to the outside world you hide that.
Semantics of the container class for the pImpl could be:
- Non-copyable, not assignable... So you "new" your pImpl on construction and "delete" on destruction
- shared. So you have shared_ptr rather than Impl*
With shared_ptr you can use a forward declaration as long as the class is complete at the point of the destructor. Your destructor should be defined even if default (which it probably will be).
swappable. You can implement "may be empty" and implements "swap". Users can create an instance of one and pass a non-const reference to it to get it populated, with a "swap".
2-stage construction. You construct an empty one then call "load()" on it to populate it.
shared is the only one I have even a remote liking for without r-value semantics. With them we can also implement non-copyable non-assignable properly. I like to be able to call a function that gives me one.
I have, however, found I tend more now to use abstract base classes more than pImpl, even when there is only one implementation.