Could C++ have not obviated the pimpl idiom? - c++

As I understand, the pimpl idiom is exists only because C++ forces you to place all the private class members in the header. If the header were to contain only the public interface, theoretically, any change in class implementation would not have necessitated a recompile for the rest of the program.
What I want to know is why C++ is not designed to allow such a convenience. Why does it demand at all for the private parts of a class to be openly displayed in the header (no pun intended)?

This has to do with the size of the object. The h file is used, among other things, to determine the size of the object. If the private members are not given in it, then you would not know how large an object to new.
You can simulate, however, your desired behavior by the following:
class MyClass
{
public:
// public stuff
private:
#include "MyClassPrivate.h"
};
This does not enforce the behavior, but it gets the private stuff out of the .h file.
On the down side, this adds another file to maintain.
Also, in visual studio, the intellisense does not work for the private members - this could be a plus or a minus.

I think there is a confusion here. The problem is not about headers. Headers don't do anything (they are just ways to include common bits of source text among several source-code files).
The problem, as much as there is one, is that class declarations in C++ have to define everything, public and private, that an instance needs to have in order to work. (The same is true of Java, but the way reference to externally-compiled classes works makes the use of anything like shared headers unnecessary.)
It is in the nature of common Object-Oriented Technologies (not just the C++ one) that someone needs to know the concrete class that is used and how to use its constructor to deliver an implementation, even if you are using only the public parts. The device in (3, below) hides it. The practice in (1, below) separates the concerns, whether you do (3) or not.
Use abstract classes that define only the public parts, mainly methods, and let the implementation class inherit from that abstract class. So, using the usual convention for headers, there is an abstract.hpp that is shared around. There is also an implementation.hpp that declares the inherited class and that is only passed around to the modules that implement methods of the implementation. The implementation.hpp file will #include "abstract.hpp" for use in the class declaration it makes, so that there is a single maintenance point for the declaration of the abstracted interface.
Now, if you want to enforce hiding of the implementation class declaration, you need to have some way of requesting construction of a concrete instance without possessing the specific, complete class declaration: you can't use new and you can't use local instances. (You can delete though.) Introduction of helper functions (including methods on other classes that deliver references to class instances) is the substitute.
Along with or as part of the header file that is used as the shared definition for the abstract class/interface, include function signatures for external helper functions. These function should be implemented in modules that are part of the specific class implementations (so they see the full class declaration and can exercise the constructor). The signature of the helper function is probably much like that of the constructor, but it returns an instance reference as a result (This constructor proxy can return a NULL pointer and it can even throw exceptions if you like that sort of thing). The helper function constructs a particular implementation instance and returns it cast as a reference to an instance of the abstract class.
Mission accomplished.
Oh, and recompilation and relinking should work the way you want, avoiding recompilation of calling modules when only the implementation changes (since the calling module no longer does any storage allocations for the implementations).

You're all ignoring the point of the question -
Why must the developer type out the PIMPL code?
For me, the best answer I can come up with is that we don't have a good way to express C++ code that allows you to operate on it. For instance, compile-time (or pre-processor, or whatever) reflection or a code DOM.
C++ badly needs one or both of these to be available to a developer to do meta-programming.
Then you could write something like this in your public MyClass.h:
#pragma pimpl(MyClass_private.hpp)
And then write your own, really quite trivial wrapper generator.

Someone will have a much more verbose answer than I, but the quick response is two-fold: the compiler needs to know all the members of a struct to determine the storage space requirements, and the compiler needs to know the ordering of those members to generate offsets in a deterministic way.
The language is already fairly complicated; I think a mechanism to split the definitions of structured data across the code would be a bit of a calamity.
Typically, I've always seen policy classes used to define implementation behavior in a Pimpl-manner. I think there are some added benefits of using a policy pattern -- easier to interchange implementations, can easily combine multiple partial implementations into a single unit which allow you to break up the implementation code into functional, reusable units, etc.

May be because the size of the class is required when passing its instance by values, aggregating it in other classes, etc ?
If C++ did not support value semantics, it would have been fine, but it does.

Yes, but...
You need to read Stroustrup's "Design and Evolution of C++" book. It would have inhibited the uptake of C++.

Related

Why would you want to put a class in an implementation file?

While looking over some code, I ran into the following:
.h file
class ExampleClass
{
public:
// methods, etc
private:
class AnotherExampleClass* ptrToClass;
}
.cpp file
class AnotherExampleClass
{
// methods, etc
}
// AnotherExampleClass and ExampleClass implemented
Is this a pattern or something beneficial when working in c++? Since the class is not broken out into another file, does this work flow promote faster compilation times?
or is this just the style this developer?
This is variously known as the pImpl Idiom, Cheshire cat technique, or Compilation firewall.
Benefits:
Changing private member variables of a class does not require recompiling classes that depend on it, thus make times are faster, and
the FragileBinaryInterfaceProblem is reduced.
The header file does not need to #include classes that are used 'by value' in private member variables, thus compile times are faster.
This is sorta like the way SmallTalk automatically handles classes... more pure encapsulation.
Drawbacks:
More work for the implementor.
Doesn't work for 'protected' members where access by subclasses is required.
Somewhat harder to read code, since some information is no longer in the header file.
Run-time performance is slightly compromised due to the pointer indirection, especially if function calls are virtual (branch prediction for indirect branches is generally poor).
Herb Sutter's "Exceptional C++" books also go into useful detail on the appropriate usage of this technique.
The most common example would be when using the PIMPL pattern or similar techniques. Still, there are other uses as well. Typically, the distinction .hpp/.cpp in C++ is rather (or, at least can be) one of public interface versus private implementation. If a type is only used as part of the implementation, then that's a good reason not to export it in the header file.
Apart from possibly being an implementation of the PIMPL idiom, here are two more possible reason to do this:
Objects in C++ cannot modify their this pointer. As a consequence, they cannot change type in mid-usage. However, ptrToClass can change, allowing an implementation to delete itself and to replace itself with another instance of another subclass of AnotherExampleClass.
If the implementation of AnotherExampleClass depends on some template parameters, but the interface of ExampleClass does not, it is possible to use a template derived from AnotherExampleClass to provide the implementation. This hides part of the necessary, yet internal type information from the user of the interface class.

Name of this C++ pattern and the reasoning behind it?

In my company's C++ codebase I see a lot of classes defined like this:
// FooApi.h
class FooApi {
public:
virtual void someFunction() = 0;
virtual void someOtherFunction() = 0;
// etc.
};
// Foo.h
class Foo : public FooApi {
public:
virtual void someFunction();
virtual void someOtherFunction();
};
Foo is this only class that inherits from FooApi and functions that take or return pointers to Foo objects use FooApi * instead. It seems to mainly be used for singleton classes.
Is this a common, named way to write C++ code? And what is the point in it? I don't see how having a separate, pure abstract class that just defines the class's interface is useful.
Edit[0]: Sorry, just to clarify, there is only one class deriving from FooApi and no intention to add others later.
Edit[1]: I understand the point of abstraction and inheritance in general but not this particular usage of inheritance.
The only reason that I can see why they would do this is for encapsulation purposes. The point here is that most other code in the code-base only requires inclusion of the "FooApi.h" / "BarApi.h" / "QuxxApi.h" headers. Only the parts of the code that create Foo objects would actually need to include the "Foo.h" header (and link with the object-file containing the definition of the class' functions). And for singletons, the only place where you would normally create a Foo object is in the "Foo.cpp" file (e.g., as a local static variable within a static member function of the Foo class, or something similar).
This is similar to using forward-declarations to avoid including the header that contains the actual class declaration. But when using forward-declarations, you still need to eventually include the header in order to be able to call any of the member functions. But when using this "abstract + actual" class pattern, you don't even need to include the "Foo.h" header to be able to call the member functions of FooApi.
In other words, this pattern provides very strong encapsulation of the Foo class' implementation (and complete declaration). You get roughly the same benefits as from using the Compiler Firewall idiom. Here is another interesting read on those issues.
I don't know the name of that pattern. It is not very common compared to the other two patterns I just mentioned (compiler firewall and forward declarations). This is probably because this method has quite a bit more run-time overhead than the other two methods.
This is for if the code is later added on to. Lets say NewFoo also extends/implements FooApi. All the current infrastructure will work with both Foo and NewFoo.
It's likely that this has been done for the same reason that pImpl ("pointer to implementation idiom", sometimes called "private implementation idiom") is used - to keep private implementation details out of the header, which means common build systems like make that use file timestamps to trigger code recompilation will not rebuild client code when only implementation has changed. Instead, the object containing the new implementation can be linked against existing client object(s), and indeed if the implementation is distributed in a shared object (aka dynamic link library / DLL) the client application can pick up a changed implementation library the next time it runs (or does a dlopen() or equivalent if it's linking at run-time). As well as facilitating distribution of updated implementation, it can reduce rebuilding times allowing a faster edit/test/edit/... cycle.
The cost of this is that implementations have to be accessed through out-of-line virtual dispatch, so there's a performance hit. This is typically insignificant, but if a trivial function like a get-int-member is called millions of times in a performance critical loop it may be of interest - each call can easily be an order of magnitude slower than inlined member access.
What's the "name" for it? Well, if you say you're using an "interface" most people will get the general idea. That term's a bit vague in C++, as some people use it whenever a base class has virtual methods, others expect that the base will be abstract, lack data members and/or private member functions and/or function definitions (other than the virtual destructor's). Expectations around the term "interface" are sometimes - for better or worse - influenced by Java's language keyword, which restricts the interface class to being abstract, containing no static methods or function definitions, with all functions being public, and only const final data members.
None of the well-known Gang of Four Design Patterns correspond to the usage you cite, and while doubtless lots of people have published (web- or otherwise) corresponding "patterns", they're probably not widely enough used (with the same meaning!) to be less confusing than "interface".
FooApi is a virtual base class, it provides the interface for concrete implementations (Foo).
The point is you can implement functionality in terms of FooApi and create multiple implementations that satisfy its interface and still work with your functionality. You see some advantage when you have multiple descendants - the functionality can work with multiple implementations. One might implement a different type of Foo or for a different platform.
Re-reading my answer, I don't think I should talk about OO ever again.

Abstract Base Class w/o Polymorphism

Why would you have an abstract base class defining an interface for a library where there is only one (always and forever) derived class?
You may want to swap out the implementation for something like Unit Testing
One reason why you would do this is for testability. It is much simpler to test dependent objects when their dependencies are defined as interfaces. This give the easy ability to mock or stub.
To violate the reused abstraction principle.
In short, don't do this.
Those who say "for testing" are overlooking that you can just replace
Base < - > Derived
Base < - > DerivedForMockingAndTesting
with
Derived < - > DerivedForMockingAndTesting
That is, let your existing implementation Derived serve as the "abstraction" to be mocked out and tested in unit testing.
If you can be 100% certain that there will always be only one and exactly one derived class? Not much reason. BUT: In reality you hardly will be 100% certain of anything and surely not the future of your code.
You may find the need for different versions of that class to be binary compatible. You may also find that for other reasons, you wish to encapsulate the definition of the class- for example, because the definition requires a header which has poor macros, and that kind of thing, or the header containing what's necessary to define the class has a very long compile time.
For example, I wrote a class to encapsulate the features offered by my operating system- for now, things like dynamic loading and creating a window. Even though there'll only ever be one implementation for one compile target (Windows, etc), I chose to use a run-time abstraction, because I wanted to guarantee that the rest of my code never saw a platform-specific header, and the Windows header is full of so many macros and stuff, that I didn't want them leaking out.
The most obvious reason would be to be able to put the class'
implementation in a source file, and not in a header. All that is
exposed in the header is the abstract base class (and a factory function
necessary to construct it, but this could be a static member). This
avoids having to include the header files for any member data; the pimpl
idiom is more idiomatic in C++ for this, but using abstract classes like
this is far from unknown, and works fairly well as well.

Private class functions vs Functions in unnamed namespace

I've found myself that I tend not to have private class functions. If possible, all candidates to private class function rather I put in to unnamed namespace and pass all necessary information as function parameters. I don't have a sound explanation why I'm doing that but at least it looks more naturally to me. As a consequence I need to expose less internal details in the header file.
What is your opinion - is it correct practice?
In the semi large projects where I usually work (more than 2 million lines of code) I would ban private class functions if I could. The reason being that a private class function is private but yet it's visible in the header file. This means if I change the signature (or the comment) in anyway I'm rewarded sometimes with a full recompile which costs several minutes (or hours depending on the project).
Just say no to that and hide what's private in the cpp file.
If I would start fresh on a large c++ project I would enforce PIMPL Idiom: http://c2.com/cgi/wiki?PimplIdiom to move even more private details into the cpp file.
I've done this in the past, and it has always ended badly. You cannot pass class objects to the functions, as they need to access the private members, presumably by reference (or you end up with convoluted parameter lists) so you cannot call public class methods. And you can't call virtual functions, for the same reason. I strongly believe (based on experience) that this is A Bad Idea.
Bottom line: This sounds like the kind of idea that might work where the implementation "module" has some special access to the class, but this is not the case in C++.
It basically comes down to a question of whether the function in question really makes sense as part of the class. If your only intent is to keep details of the class out of the header, I'd consider using the pimpl idiom instead.
I think this is a good practice. It often has the benefit of hiding auxiallary structures and data types as well, which reduces the frequency and size of rebuilds. It also makes the functions easier to split out into another module if it turns out that they're useful elsewhere.

Should I use nested classes in this case?

I am working on a collection of classes used for video playback and recording. I have one main class which acts like the public interface, with methods like play(), stop(), pause(), record() etc... Then I have workhorse classes which do the video decoding and video encoding.
I just learned about the existence of nested classes in C++, and I'm curious to know what programmers think about using them. I am a little wary and not really sure what the benefits/drawbacks are, but they seem (according to the book I'm reading) to be used in cases such as mine.
The book suggests that in a scenario like mine, a good solution would be to nest the workhorse classes inside the interface class, so there are no separate files for classes the client is not meant to use, and to avoid any possible naming conflicts? I don't know about these justifications. Nested classes are a new concept to me. Just want to see what programmers think about the issue.
I would be a bit reluctant to use nested classes here. What if you created an abstract base class for a "multimedia driver" to handle the back-end stuff (workhorse), and a separate class for the front-end work? The front-end class could take a pointer/reference to an implemented driver class (for the appropriate media type and situation) and perform the abstract operations on the workhorse structure.
My philosophy would be to go ahead and make both structures accessible to the client in a polished way, just under the assumption they would be used in tandem.
I would reference something like a QTextDocument in Qt. You provide a direct interface to the bare metal data handling, but pass the authority along to an object like a QTextEdit to do the manipulation.
You would use a nested class to create a (small) helper class that's required to implement the main class. Or for example, to define an interface (a class with abstract methods).
In this case, the main disadvantage of nested classes is that this makes it harder to re-use them. Perhaps you'd like to use your VideoDecoder class in another project. If you make it a nested class of VideoPlayer, you can't do this in an elegant way.
Instead, put the other classes in separate .h/.cpp files, which you can then use in your VideoPlayer class. The client of VideoPlayer now only needs to include the file that declares VideoPlayer, and still doesn't need to know about how you implemented it.
One way of deciding whether or not to use nested classes is to think whether or not this class plays a supporting role or it's own part.
If it exists solely for the purpose of helping another class then I generally make it a nested class. There are a whole load of caveats to that, some of which seem contradictory but it all comes down to experience and gut-feeling.
sounds like a case where you could use the strategy pattern
Sometimes it's appropriate to hide the implementation classes from the user -- in these cases it's better to put them in an foo_internal.h than inside the public class definition. That way, readers of your foo.h will not see what you'd prefer they not be troubled with, but you can still write tests against each of the concrete implementations of your interface.
We hit an issue with a semi-old Sun C++ compiler and visibility of nested classes which behavior changed in the standard. This is not a reason to not do your nested class, of course, just something to be aware of if you plan on compiling your software on lots of platforms including old compilers.
Well, if you use pointers to your workhorse classes in your Interface class and don't expose them as parameters or return types in your interface methods, you will not need to include the definitions for those work horses in your interface header file (you just forward declare them instead). That way, users of your interface will not need to know about the classes in the background.
You definitely don't need to nest classes for this. In fact, separate class files will actually make your code a lot more readable and easier to manage as your project grows. it will also help you later on if you need to subclass (say for different content/codec types).
Here's more information on the PIMPL pattern (section 3.1.1).
You should use an inner class only when you cannot implement it as a separate class using the would-be outer class' public interface. Inner classes increase the size, complexity, and responsibility of a class so they should be used sparingly.
Your encoder/decoder class sounds like it better fits the Strategy Pattern
One reason to avoid nested classes is if you ever intend to wrap the code with swig (http://www.swig.org) for use with other languages. Swig currently has problems with nested classes, so interfacing with libraries that expose any nested classes becomes a real pain.
Another thing to keep in mind is whether you ever envision different implementations of your work functions (such as decoding and encoding). In that case, you would definitely want an abstract base class with different concrete classes which implement the functions. It would not really be appropriate to nest a separate subclass for each type of implementation.