Related
I'm a computational physicist trying to learn how to code properly. I've written several program by now, but the following canonical example keeps coming back, and I'm unsure as to how to handle it. Let's say that I have a composition of two objects such as
class node
{
int position;
};
class lattice
{
vector <node*> nodes;
double distance (node*,node*);
};
Now, this will not work, because position is a private member of node. I know of two ways to solve this: either you create an accessor such as getpos(){return position}, or make lattice a friend of node.
The second of these solutions seems a lot easier to me. However, I am under the impression that it is considered slightly bad practice, and that one generally ought to stick to accessors and avoid friend. My question is this: When should I use accessors, and when should I use friendship for compositions such as these?
Also, a bonus question that has been bugging me for some time: Why are compositions preferred to subclasses in the first place? To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?
Friend is better suited if you give access rights to only specific classes, rather than to all. If you define getpos(){return position}, position information will be publicly accessible via that getter method. If you use friend keyword, on the other hand, only the lattice class will be able to access position info. Therefore, it is purely dependent on your design decisions, whether you wanna make the information publicly accessible or not.
You made a "quasi class", this a textbook example of how not to do OOP because changing position doesn't change anything else in node. Even if changing position would change something in node, I would rethink the structure to avoid complexity and improve the compiler's ability to optimize your code.
I’ve witnessed C++ and Java programmers routinely churning out such
classes according to a sort of mental template. When I ask them to
explain their design, they often insist that this is some sort of
“canonical form” that all elementary and composite item (i.e.
non-container) classes are supposed to take, but they’re at a loss to
explain what it accomplishes. They sometimes claim that we need the
get and set functions because the member data are private, and, of
course, the member data have to be private so that they can be changed
without affecting other programs!
Should read:
struct node
{
int position;
};
Not all classes have to have private data members at all. If your intention is to create a new data type, then it may be perfectly reasonable for position to just be a public member. For instance, if you were creating a type of "3D Vectors", that is essentially nothing but a 3-tuple of numeric data types. It doesn't benefit from hiding its data members since its constructor and accessor methods have no fewer degrees of freedom than its internal state does, and there is no internal state that can be considered invalid.
template<class T>
struct Vector3 {
T x;
T y;
T z;
};
Writing that would be perfectly acceptable - plus overloads for various operators and other functions for normalizing, taking the magnitude, and so on.
If a node has no illegal position value, but no two nodes in a lattice cannot have the same position or some other constraint, then it might make sense for node to have public member position, while lattice has private member nodes.
Generally, when you are constructing "algebraic data types" like the Vector3<T> example, you use struct (or class with public) when you are creating product types, i.e. logical ANDs between other existent types, and you use std::variant when you are creating sum types, i.e. logical ORs between existent types. (And for completeness' sake, function types then take the place of logical implications.)
Compositions are preferred over inheritance when, like you say, the relationship is a "has-a" relationship. Inheritance is best used when you are trying to extend or link with some legacy code, I believe. It was previously also used as a poor approximation of sum types, before std::variant existed, because the union keyword really doesn't work very well. However, you are almost always better off using composition.
Concerning your example code, I am not sure that this poses a composition. In a composition, the child object does not exist as an independent entity. As a rule of thumb, it's life time is coupled with the container. Since you are using a vector<node*> nodes, I assume that the nodes are created somewhere else and lattice only has a pointer to these objects. An example for a composition would be
class lattice {
node n1; // a single object
std::vector<node> manyNodes;
};
Now, addressing the questions:
"When should I use accessors, and when should I use friendship for compositions such as these?"
If you use plenty of accessors in your code, your are creating structs and not classes in an OO sense. In general, I would argue that besides certain prominent exceptions such as container classes one rarely needs setters at all. The same can be argued for simple getters for plain members, except when the returning the property is a real part of the class interface, e.g. the number of elements in a container. Your interface should provide meaningful services that manipulate the internal data of your object. If you frequently get some internal data with a getter, then compute something and set it with an accessor you should put this computation in a method.
One of the main reasons why to avoid ´friend´ is because it introduces a very strong coupling between two components. The guideline here is "low coupling, high cohesion". Strong coupling is considered a problem because it makes code hard to change, and most time on software projects is spent in maintenance or evoluation. Friend is especially problematic because it allows unrelated code to be based on internal properties of your class, which can break encapsulation. There are valid use-cases for ´friend´ when the classes form a strongly related cluster (aka high cohesion).
"Why are compositions preferred to subclasses in the first place?"
In general, you should prefer plain composition over inheritance and friend classes since it reduces coupling. In a composition, the container class can only access the public interface of the contained class and has no knowledge about the internal data.
From a pure OOP point of view, your design has some weaknesses and is probably not very OO. One of the basic principles of OOP is encapsulation which means to couple related data and behavior into objects. The node class e.g. does not have any meaning other than storing a position, so it does not have any behavior. It seems that you modeled the data of your code but not the behavior. This can be a very appropriate design and lead to good code, but it not really object-oriented.
"To my understanding the HAS-A mnemonic argues this, but, it seems more intuitive to me to imagine a lattice as an object that has an object called node. That would then be an object inside of an object, e.i. a subclass?"
I think you got this wrong. Public inheritance models an is-a-relationship.
class A: public B {};
It basically says that objects of class A are a special kind of B, fulfilling all the assumptions that you can make about objects of type B. This is known as the Liskov substitution principle. It basically says that everywhere in your code where you use a B you should be able to also use an A. Considering this, class lattice: public node would mean that every lattice is a node. On the other hand,
class lattice {
int x;
node n;
int y;
};
means that an object of type lattice contains another object of type node, in C++ physically placed together with x and y. This is a has-a-relationship.
In my simulation I have different objects that can be sensed in three ways: object can be seen and/or heard and/or smelled. For example, Animal can be seen, heard and smelled. And piece of Meat on the ground can be seen and smelled but not heard and Wall can only be seen. Then I have different sensors that gather this information - EyeSensor, EarSensor, NoseSensor.
Before state: brief version gist.github.com link
Before I started implementing NoseSensor I had all three functionality in one class that every object inherited - CanBeSensed because although classes were different they all needed the same getDistanceMethod() and if object implemented any CanBeSensed functionality it needed a senseMask - flags if object can be heard/seen/smelled and I didn't want to use virtual inheritance. I sacrificed having data members inside this class for smell, sounds, EyeInfo because objects that can only be seen do not need smell/sound info.
Objects then were registered in corresponding Sensor.
Now I've noticed that Smell and Sound sensors are the same and only differ in a single line inside a loop - one calls float getSound() and another float getSmell() on a CanBeSensed* object. When I create one of this two sensors I know what it needs to call, but I don't know how to choose that line without a condition and it's inside a tight loop and a virtual function.
So I've decided to make a single base class for these 3 functionality using virtual inheritance for base class with getDistanceMethod().
But now I had to make my SensorBase class a template class because of this method
virtual void sense(std::unordered_map<IdInt, CanBeSensed*>& objectsToSense) = 0;
, and it meant that I need to make SensorySubSystem class(manages sensors and objects in range) a template as well. And it meant that all my SubSystems like VisionSubSystem, HearingSubSystem and SmellSubSystem inherit from a template class, and it broke my SensorySystem class which was managing all SensorySubSystems through a vector of pointers to SensorySubSystem class std::vector<SensorySubSystem*> subSystems;
Please, could you suggest some solution for how to restructure this or how to make compiler decide at compile time(or at least decide once per call//once per object creation) what method to call inside Hearing/Smell Sensors.
Looking at your original design I have a few comments:
The class design in hierarchy.cpp looks quite ok to me.
Unless distance is something specific to sensory information getDistance() doesn't look like a method that belongs into this class. It could be moved either into a Vec2d-class or to a helper function (calculatePositon(vec2d, vec2d)). I do not see, why getDistance() is virtual, if it does something different than calculating the distance between the given position and the objects position, then it should be renamed.
The class CanBeSensed sounds more like a property and should probably be renamed to e.g. SensableObject.
Regarding your new approach:
Inheritance should primarily be used to express concepts (is-a-relations), not to share code. If you want to reuse an algorithm, consider writing an algorithm class or function (favour composition over inheritance).
In summary I propose to keep your original class design cleaning it up a little as described above. You could add virtual functions canBeSmelled/canBeHeard/canBeSeen to CanBeSensed.
Alternatively you could create a class hierachy:
class Object{ getPosition(); }
class ObjectWithSmell : virtual Object
class ObjectWithSound : virtual Object
...
But then you'd have to deal with virtual inheritance without any noticeable benefit.
The shared calculation code could go into an algorithmic class or function.
Any opinions on best way to organize members of a class (esp. when there are many) in C++. In particular, a class has lots of user parameters, e.g. a class that optimizes some function and has number of parameters such as # of iterations, size of optimization step, specific method to use, optimization function weights etc etc. I've tried several general approaches and seem to always find something non-ideal with it. Just curious others experiences.
struct within the class
struct outside the class
public member variables
private member variables with Set() & Get() functions
To be more concrete, the code I'm working on tracks objects in a sequence of images. So one important aspect is that it needs to preserve state between frames (why I didn't just make a bunch of functions). Significant member functions include initTrack(), trackFromLastFrame(), isTrackValid(). And there are a bunch of user parameters (e.g. how many points to track per object tracked, how much a point can move between frames, tracking method used etc etc)
If your class is BIG, then your class is BAD.
A class should respect the Single Responsibility Principle , i.e. : A class should do only one thing, but should do it well. (Well "only one" thing is extreme, but it should have only one role, and it has to be implemented clearly).
Then you create classes that you enrich by composition with those single-role little classes, each one having a clear and simple role.
BIG functions and BIG classes are nest for bugs, and misunderstanding, and unwanted side effects, (especially during maintainance), because NO MAN can learn in minutes 700 lines of code.
So the policy for BIG classes is: Refactor, Composition with little classes targetting only at what they have do.
if i had to choose one of the four solutions you listed: private class within a class.
in reality: you probably have duplicate code which should be reused, and your class should be reorganized into smaller, more logical and reusable pieces. as GMan said: refactor your code
First, I'd partition the members into two sets: (1) those that are internal-only use, (2) those that the user will tweak to control the behavior of the class. The first set should just be private member variables.
If the second set is large (or growing and changing because you're still doing active development), then you might put them into a class or struct of their own. Your main class would then have a two methods, GetTrackingParameters and SetTrackingParameters. The constructor would establish the defaults. The user could then call GetTrackingParameters, make changes, and then call SetTrackingParameters. Now, as you add or remove parameters, your interface remains constant.
If the parameters are simple and orthogonal, then they could be wrapped in a struct with well-named public members. If there are constraints that must be enforced, especially combinations, then I'd implement the parameters as a class with getters and setters for each parameter.
ObjectTracker tracker; // invokes constructor which gets default params
TrackerParams params = tracker.GetTrackingParameters();
params.number_of_objects_to_track = 3;
params.other_tracking_option = kHighestPrecision;
tracker.SetTrackingParameters(params);
// Now start tracking.
If you later invent a new parameter, you just need to declare a new member in the TrackerParams and initialize it in ObjectTracker's constructor.
It all depends:
An internal struct would only be useful if you need to organize VERY many items. And if this is the case, you ought to reconsider your design.
An external struct would be useful if it will be shared with other instances of the same or different classes. (A model, or data object class/struct might be a good example)
Is only ever advisable for trivial, throw-away code.
This is the standard way of doing things but it all depends on how you'll be using the class.
Sounds like this could be a job for a template, the way you described the usage.
template class FunctionOptimizer <typename FUNCTION, typename METHOD,
typename PARAMS>
for example, where PARAMS encapsulates simple optimization run parameters (# of iterations etc) and METHOD contains the actual optimization code. FUNCTION describes the base function you are targeting for optimization.
The main point is not that this is the 'best' way to do it, but that if your class is very large there are likely smaller abstractions within it that lend themselves naturally to refactoring into a less monolithic structure.
However you handle this, you don't have to refactor all at once - do it piecewise, starting small, and make sure the code works at every step. You'll be surprised how much better you quickly feel about the code.
I don't see any benefit whatsoever to making a separate structure to hold the parameters. The class is already a struct - if it were appropriate to pass parameters by a struct, it would also be appropriate to make the class members public.
There's a tradeoff between public members and Set/Get functions. Public members are a lot less boilerplate, but they expose the internal workings of the class. If this is going to be called from code that you won't be able to refactor if you refactor the class, you'll almost certainly want to use Get and Set.
Assuming that the configuration options apply only to this class, use private variables that are manipulated by public functions with meaningful function names. SetMaxInteriorAngle() is much better than SetMIA() or SetParameter6(). Having getters and setters allows you to enforce consistency rules on the configuration, and can be used to compensate for certain amounts of change in the configuration interface.
If these are general settings, used by more than one class, then an external class would be best, with private members and appropriate functions.
Public data members are usually a bad idea, since they expose the class's implementation and make it impossible to have any guaranteed relation between them. Walling them off in a separate internal struct doesn't seem useful, although I would group them in the list of data members and set them off with comments.
I have an interface class similar to:
class IInterface
{
public:
virtual ~IInterface() {}
virtual methodA() = 0;
virtual methodB() = 0;
};
I then implement the interface:
class AImplementation : public IInterface
{
// etc... implementation here
}
When I use the interface in an application is it better to create an instance of the concrete class AImplementation. Eg.
int main()
{
AImplementation* ai = new AIImplementation();
}
Or is it better to put a factory "create" member function in the Interface like the following:
class IInterface
{
public:
virtual ~IInterface() {}
static std::tr1::shared_ptr<IInterface> create(); // implementation in .cpp
virtual methodA() = 0;
virtual methodB() = 0;
};
Then I would be able to use the interface in main like so:
int main()
{
std::tr1::shared_ptr<IInterface> test(IInterface::create());
}
The 1st option seems to be common practice (not to say its right). However, the 2nd option was sourced from "Effective C++".
One of the most common reasons for using an interface is so that you can "program against an abstraction" rather then a concrete implementation.
The biggest benefit of this is that it allows changing of parts of your code while minimising the change on the remaining code.
Therefore although we don't know the full background of what you're building, I would go for the Interface / factory approach.
Having said this, in smaller applications or prototypes I often start with concrete classes until I get a feel for where/if an interface would be desirable. Interfaces can introduce a level of indirection that may just not be necessary for the scale of app you're building.
As a result in smaller apps, I find I don't actually need my own custom interfaces. Like so many things, you need to weigh up the costs and benefits specific to your situation.
There is yet another alternative which you haven't mentioned:
int main(int argc, char* argv[])
{
//...
boost::shared_ptr<IInterface> test(new AImplementation);
//...
return 0;
}
In other words, one can use a smart pointer without using a static "create" function. I prefer this method, because a "create" function adds nothing but code bloat, while the benefits of smart pointers are obvious.
There are two separate issues in your question:
1. How to manage the storage of the created object.
2. How to create the object.
Part 1 is simple - you should use a smart pointer like std::tr1::shared_ptr to prevent memory leaks that otherwise require fancy try/catch logic.
Part 2 is more complicated.
You can't just write create() in main() like you want to - you'd have to write IInterface::create(), because otherwise the compiler will be looking for a global function called create, which isn't what you want. It might seem like having the 'std::tr1::shared_ptr test' initialized with the value returned by create() might seem like it'd do what you want, but that's not how C++ compilers work.
As to whether using a factory method on the interface is a better way to do this than just using new AImplementation(), it's possible it'd be helpful in your situation, but beware of speculative complexity - if you're writing the interface so that it always creates an AImplementation and never a BImplementation or a CImplementation, it's hard to see what the extra complexity buys you.
"Better" in what sense?
The factory method doesn't buy you much if you only plan to have, say, one concrete class. (But then again, if you only plan to have one concrete class, do you really need the interface class at all? Maybe yes, if you're using COM.) In any case, if you can forsee a small, fixed limit on the number of concrete classes, then the simpler implementation may be the "better" one, on the whole.
But if there may be many concrete classes, and if you don't want to have the base class be tightly coupled to them, then the factory pattern may be useful.
And yes, this can help reduce coupling -- if the base class provides some means for the derived classes to register themselves with the base class. This would allow the factory to know which derived classes exist, and how to create them, without needing compile-time information about them.
Use the 1st method. Your factory method in the 2nd option would have to be implemented per-concrete class and this is not possible to do in the interface. I.e., IInterface::create() has no idea exactly which concrete class you actually wish to instantiate.
A static method cannot be virtual, and implementing a non-static create() method in your concrete classes has not really won you anything in this case.
Factory methods are certainly useful, but this is not the correct use.
Which item in Effective C++ recommends the 2nd option? I don't see it in mine (though I don't also have the second book). That may clear up a mis-understanding.
I would go with the first option just because it's more common and more understandable. It's really up to you, but if your working on a commercial app then I would ask what my peers what they use.
I do have a very simple question there:
Are you sure you want to use a pointer ?
This question might seem unlogical but people coming from a Java background use new much often than required. In your example, creating the variable on the stack would be amply sufficient.
I was wondering what would make a programmer to choose either Pimpl idiom or pure virtual class and inheritance.
I understand that pimpl idiom comes with one explicit extra indirection for each public method and the object creation overhead.
The Pure virtual class in the other hand comes with implicit indirection(vtable) for the inheriting implementation and I understand that no object creation overhead.
EDIT: But you'd need a factory if you create the object from the outside
What makes the pure virtual class less desirable than the pimpl idiom?
When writing a C++ class, it's appropriate to think about whether it's going to be
A Value Type
Copy by value, identity is never important. It's appropriate for it to be a key in a std::map. Example, a "string" class, or a "date" class, or a "complex number" class. To "copy" instances of such a class makes sense.
An Entity type
Identity is important. Always passed by reference, never by "value". Often, doesn't make sense to "copy" instances of the class at all. When it does make sense, a polymorphic "Clone" method is usually more appropriate. Examples: A Socket class, a Database class, a "policy" class, anything that would be a "closure" in a functional language.
Both pImpl and pure abstract base class are techniques to reduce compile time dependencies.
However, I only ever use pImpl to implement Value types (type 1), and only sometimes when I really want to minimize coupling and compile-time dependencies. Often, it's not worth the bother. As you rightly point out, there's more syntactic overhead because you have to write forwarding methods for all of the public methods. For type 2 classes, I always use a pure abstract base class with associated factory method(s).
Pointer to implementation is usually about hiding structural implementation details. Interfaces are about instancing different implementations. They really serve two different purposes.
The pimpl idiom helps you reduce build dependencies and times especially in large applications, and minimizes header exposure of the implementation details of your class to one compilation unit. The users of your class should not even need to be aware of the existence of a pimple (except as a cryptic pointer to which they are not privy!).
Abstract classes (pure virtuals) is something of which your clients must be aware: if you try to use them to reduce coupling and circular references, you need to add some way of allowing them to create your objects (e.g. through factory methods or classes, dependency injection or other mechanisms).
I was searching an answer for the same question.
After reading some articles and some practice I prefer using "Pure virtual class interfaces".
They are more straight forward (this is a subjective opinion). Pimpl idiom makes me feel I'm writing code "for the compiler", not for the "next developer" that will read my code.
Some testing frameworks have direct support for Mocking pure virtual classes
It's true that you need a factory to be accessible from the outside.
But if you want to leverage polymorphism: that's also "pro", not a "con". ...and a simple factory method does not really hurts so much
The only drawback (I'm trying to investigate on this) is that pimpl idiom could be faster
when the proxy-calls are inlined, while inheriting necessarily need an extra access to the object VTABLE at runtime
the memory footprint the pimpl public-proxy-class is smaller (you can do easily optimizations for faster swaps and other similar optimizations)
I hate pimples! They do the class ugly and not readable. All methods are redirected to pimple. You never see in headers, what functionalities has the class, so you can not refactor it (e. g. simply change the visibility of a method). The class feels like "pregnant". I think using iterfaces is better and really enough to hide the implementation from the client. You can event let one class implement several interfaces to hold them thin. One should prefer interfaces!
Note: You do not necessary need the factory class. Relevant is that the class clients communicate with it's instances via the appropriate interface.
The hiding of private methods I find as a strange paranoia and do not see reason for this since we hav interfaces.
There's a very real problem with shared libraries that the pimpl idiom circumvents neatly that pure virtuals can't: you cannot safely modify/remove data members of a class without forcing users of the class to recompile their code. That may be acceptable under some circumstances, but not e.g. for system libraries.
To explain the problem in detail, consider the following code in your shared library/header:
// header
struct A
{
public:
A();
// more public interface, some of which uses the int below
private:
int a;
};
// library
A::A()
: a(0)
{}
The compiler emits code in the shared library that calculates the address of the integer to be initialized to be a certain offset (probably zero in this case, because it's the only member) from the pointer to the A object it knows to be this.
On the user side of the code, a new A will first allocate sizeof(A) bytes of memory, then hand a pointer to that memory to the A::A() constructor as this.
If in a later revision of your library you decide to drop the integer, make it larger, smaller, or add members, there'll be a mismatch between the amount of memory user's code allocates, and the offsets the constructor code expects. The likely result is a crash, if you're lucky - if you're less lucky, your software behaves oddly.
By pimpl'ing, you can safely add and remove data members to the inner class, as the memory allocation and constructor call happen in the shared library:
// header
struct A
{
public:
A();
// more public interface, all of which delegates to the impl
private:
void * impl;
};
// library
A::A()
: impl(new A_impl())
{}
All you need to do now is keep your public interface free of data members other than the pointer to the implementation object, and you're safe from this class of errors.
Edit: I should maybe add that the only reason I'm talking about the constructor here is that I didn't want to provide more code - the same argumentation applies to all functions that access data members.
We must not forget that inheritance is a stronger, closer coupling than delegation. I would also take into account all the issues raised in the answers given when deciding what design idioms to employ in solving a particular problem.
Although broadly covered in the other answers maybe I can be a bit more explicit about one benefit of pimpl over virtual base classes:
A pimpl approach is transparent from the user view point, meaning you can e.g. create objects of the class on the stack and use them directly in containers. If you try to hide the implementation using an abstract virtual base class, you will need to return a shared pointer to the base class from a factory, complicating it's use. Consider the following equivalent client code:
// Pimpl
Object pi_obj(10);
std::cout << pi_obj.SomeFun1();
std::vector<Object> objs;
objs.emplace_back(3);
objs.emplace_back(4);
objs.emplace_back(5);
for (auto& o : objs)
std::cout << o.SomeFun1();
// Abstract Base Class
auto abc_obj = ObjectABC::CreateObject(20);
std::cout << abc_obj->SomeFun1();
std::vector<std::shared_ptr<ObjectABC>> objs2;
objs2.push_back(ObjectABC::CreateObject(13));
objs2.push_back(ObjectABC::CreateObject(14));
objs2.push_back(ObjectABC::CreateObject(15));
for (auto& o : objs2)
std::cout << o->SomeFun1();
In my understanding these two things serve completely different purposes. The purpose of the pimple idiom is basically give you a handle to your implementation so you can do things like fast swaps for a sort.
The purpose of virtual classes is more along the line of allowing polymorphism, i.e. you have a unknown pointer to an object of a derived type and when you call function x you always get the right function for whatever class the base pointer actually points to.
Apples and oranges really.
The most annoying problem about the pimpl idiom is it makes it extremely hard to maintain and analyse existing code. So using pimpl you pay with developer time and frustration only to "reduce build dependencies and times and minimize header exposure of the implementation details". Decide yourself, if it is really worth it.
Especially "build times" is a problem you can solve by better hardware or using tools like Incredibuild ( www.incredibuild.com, also already included in Visual Studio 2017 ), thus not affecting your software design. Software design should be generally independent of the way the software is built.