Lot of tutorials list abstraction as one of 4 basic principles in C++ (remaining 3 as encapsulation, inheritance and polymorphism). I tried to understand the concept of abstraction. Lot of online tutorials say that abstraction is a concept which hides the implementation details and provides only the interface. I didn't clearly understand this point. I didn't understand what we are hiding. Is this talking about hiding the internal structures that the function uses? if that is the case, even normal C function also will do this. When I talked with one of my colleague about this, he told abstract class is the best example of abstraction. But I didn't understand this also. Because when we have pure virtual function, we can't create an instance of the class and the pure virtual function mostly doesn't have definition. So there is no concept of hiding in this case. Can any one please explain abstraction in C++ with example?
You should distinguish between a language construct as abstract classes and a generic concept as abstraction.
Although abstract classes may be a useful tool in creating abstractions it's not a necessary tool, neither is using that tool a guarantee that you would get a (good) abstraction.
For example there are abstractions all over the place in the C++ standard so one should not require to come up with another example.
Take for example the STL. There are a number of containers of different kind, but for example there are sequences which all conform to a common set of functions defined on them, in addition there are guaranteed complexities for different operations depending on which one you select. The abstraction here is that these are sequential containers that you can use to store data in. Although they don't use virtual functions, the implementation varies from implementation to implementation (or at least could vary), but if you use it according to the specification the actual implementation would not matter to the programmer (and most often the programmer does not dig into the actual implementation).
Another abstraction in the specification is the language itself, the execution environment specified therein and the translation process. These parts are not specified in terms of how they are implemented, but according to the expected behavior. For example normally an implementation would implement local variables by putting them on the processor stack, but that is an implementation detail that the C++ specification leaves out. The specification puts up a number of assumptions about the behavior of the execution. And you construct your program using these assumptions instead of assuming that the implementation would need to be done in a specific concrete way.
Abstraction is something very natural in every day life, it is very common to talk about something without getting into many details of the thing. You can use your car without thinking/knowing about mechanics, fluid mechanics, chemistry, engineering, etc. Abstraction in computer engineering is exactly the same thing (in general).
Yes a simple function provides an abstraction. But functions are just small parts of a software, and they are sometimes built by factoring the code (a good idea but that do not always lead to a good abstraction). An abstraction should have a clear semantic meaning not tricky.
OOP is a paradigm in witch you can built new types and let you forget about the details of them. As in an course about algorithm where one can tell you how quicksort works but never speak about the real nature of the elements they are sorting (it is certainly not an interesting point in sorting). What is interesting about object (as with your car) is the way one can manipulate an object not how the behavior is realized. I want to turn to the left by rotating the steering to the left, I don't want to know that really happens behind the scene when I do this. When I leave my car to the repair man, I let him do anything he wants on my car provided that it works as usual (he can change anything he wants behind the scene). As a user I just want to focus on the manual not the internals. So you need to make a difference in between the interface of an ideal object (the manual) and the realization of a concrete object (the internals schemas). This is what every OOP language let you write (in different ways of course you have a variety of possibilities to realize all of this).
So you want to talk about points on the plane somewhere in your code? Let's talk about the manual (a short one for the sake on simplicity). A Point is an object from which you can get its cartesian coordinates or its polar ones, right? Then its abstract, whatever a Point is obtain/realized in the software you want to be able to do this with it. So it is an abstraction:
class Point {
public:
virtual double getX() = 0;
virtual double getY() = 0;
virtual double getAngle() = 0;
virtual double getLength() = 0;
}
This is a manual, with this you can use a point (provided you have one), then you can write a valid compilable code:
void f(Point *p) {
cout << p->getX() << "," << p->getY() << endl;
}
Here you need to be careful, either pass a pointer or a reference. You pass an object as an abstraction, then something should happen to retrieve the realization, in C++ this necessitate reference or pointer. Note that this function does not receive a Point (a Point is an abstraction something that doesn't exists), but can receive any kind of realization of a Point (this makes a big difference). Note: that this code is compilable and remains valid while you call it with a realization of the abstraction (this can be valid for a very very long time! Code reusability, you know?)
Ok now somewhere you can realize the abstraction:
class PolarPoint : public Point {
private:
double angle, length;
public:
PolarPoint(double a,double l) : angle(a), length(l) {}
virtual double getX() { return length*cos(angle); }
virtual double getY() { return length*sin(angle); }
virtual double getLength() { return length; }
virtual double getAngle() { return angle; }
}
Somewhere you instantiate it (create an object of this concrete model and then use it (then forget about all of its specificity) :
...
Point *p = new PolarPoint(3.14/4,10.0);
f( p );
....
Remind that f has been compiled even a long time ago, but works with this new realization now! An abstraction is a kind of contract.
You can also realize in another way:
class CartesianPoint : public Point {
private:
double x, y;
public:
CartesianPoint(double x,double y) : x(x), y(y) {}
virtual double getX() { return x; }
virtual double getY() { return y; }
virtual double getLength() { return /* the calculus from x/y*/; }
virtual double getAngle() { return /* the calculus from x/y */; }
}
...
Point *p2 = new CartesianPoint(3.14/6,20.56);
f( p );
...
In this example I also used information hiding, concept related to abstraction (at least useful with abstraction). private/public is related to information hiding, which lets you enforce the hiding, meaning that the user of a class can't access (at least too easily) the details, not only he is discouraged from look at them but he can't manipulate them. Again, with your car, it is not easy to change a piston, not only because it is an inner part of the engine but also because the constructor provide many ways to hide this from you : no instruction manual to do so, special tools difficult to obtain, etc. You may know that your car has a carburetor, but you may be unable to touch it.
Beware that abstraction does not mean hiding, but just let you forget about the details if you don't want to (and in general you don't want to). Abstraction is a good way to obtain low coupling of software components.
No, abstraction does not mean you must hide the internal structures.
CPP Primer Plus, page 507 give you an explain and also the example.
Life is full of complexities, and one way we cope with complexity is to frame simplifying
abstractions.You are a collection of more than an octillion atoms. Some students of the
mind would say that your mind is a collection of several semiautonomous agents. But it’s
much simpler to think of yourself as a single entity. In computing, abstraction is the crucial
step of representing information in terms of its interface with the user.That is, you
abstract the essential operational features of a problem and express a solution in those
terms. In the softball statistics example, the interface describes how the user initializes,
updates, and displays the data. From abstraction, it is a short step to the user-defined type,
which in C++ is a class design that implements the abstract interface.
Lot of tutorials list abstraction as one of 4 basic principles in C++ (remaining 3 as encapsulation, inheritance and polymorphism).
That list seems to describe Object Orientation, in any language. C++ has many "basic principles" depending on your perspective, and there's no agreed upon list.
I tried to understand the concept of abstraction. Lot of online tutorials say that abstraction is a concept which hides the implementation details and provides only the interface. I didn't clearly understand this point. I didn't understand what we are hiding. Is this talking about hiding the internal structures that the function uses? if that is the case, even normal C function also will do this.
Let's look at an example. Let's imagine a program handles a series of numeric inputs, and at a high - "abstract" - level, it wants to collect some statistics about those numbers. We might write:
#include <iostream>
template <typename Stats, typename T>
bool process_input(std::istream& in, Stats& stats)
{
T v;
while (in >> std::skipws && !in.eof() && in >> v)
stats(v);
return in; // true if no errors
}
In the above code, we "call" stats with each value v that we read from the input. But, we have no idea what stats does with the values: does it save them all, calculate min, max, a total, stdddev, the third percentile? Someone else can care because we've written our input logic above to abstract away those questions: the caller can provide a suitable stats object that does whatever's necessary (even nothing), as long as it's valid to "call" it with a value of type T using the stats(v) notation. Similarly, we didn't make a decision about what types of data the input would contain: T could be double, or std::string, or int or some yet-to-be-written class, and yet our algorithm would work for any of those because it abstracts the input logic.
Say we want a Stats object that can find the minimum and maximum of a set of values. In C++, I could write:
template <typename T>
class Stats
{
public:
Stats() : num_samples_(0) { }
void operator()(T t)
{
if (++num_samples_ == 1)
minimum_ = maximum_ = t;
else if (t < minimum_)
minimum_ = t;
else if (t > maximum_)
maximum_ = t;
}
T minimum() const { return minimum_; }
T maximum() const { return maximum_; }
size_t num_samples() const { return num_samples_; }
friend std::ostream& operator<<(std::ostream& os, const Stats& s)
{
os << "{ #" << s.num_samples_;
if (s.num_samples_)
os << ", min " << minimum_ << ", max " << maximum_;
return os << " }";
}
private:
size_t num_samples_;
T minimum_, maximum_;
};
This is just one possible implementation of an object that can be passed to process_input above. It is the void operator()(T t) function that satisfies the interface expectations of process_input. Any other function that handles a series of values could pass them to a Stat object, and they could even stream out the collected stats...
std::cout << stats << '\n';
...without ever understanding which statistics were calculated/collected. Again, that's abstraction: you can say what is to be done at a very high level, without knowing the lower-level details, let alone how it will be done.
When I talked with one of my colleague about this, he told abstract class is the best example of abstraction. But I didn't understand this also. Because when we have pure virtual function, we can't create an instance of the class and the pure virtual function mostly doesn't have definition. So there is no concept of hiding in this case. Can any one please explain abstraction in C++ with example?
What you're hiding with abstraction is how things get done - that's expressed in the definitions, so an abstract class does at least have that small amount of abstraction. Still, let's contrast the above example that had a reasonable level of abstraction from code that lacks abstraction, despite the use of an abstract class:
class Abstract_Stats
{
public:
virtual double get_minimum() const = 0;
virtual void set_minimum(double m) = 0;
virtual double get_maximum() const = 0;
virtual void set_maximum(double m) = 0;
private:
double minimum_, maximum_;
};
With such a stupid abstract class, our process_input function would need to be rewritten thus:
bool process_input(std::istream& in, Abstract_Stats& stats)
{
int v;
size_t n = 0;
while (in >> std::skipws && !in.eof() && in >> v)
if (++n == 1) { stats.set_minimum(v); stats.set_maximum(v); }
else if (v < stats.get_minimum()) stats.set_minimum(v);
else if (v > stats.get_maximum()) stats.set_maximum(v);
return in; // true if no errors
}
Suddenly, our Abstract_Stats class with it's less abstract interface has forced us to mix specifics of statistics gathering functionality into the input logic.
So, abstraction is less about whether a function is pure virtual, and more about the division of work to make things reusable in different combinations, with each being cleanly testable and understandable independently.
Abstraction and abstact classes are not the same.
Abstraction is simply creating a model of a concept or thing. However, abstraction in programming usually implies that the model is more simple than what you're abstracting. This goes for mostly all programming languages: most have constructs or ways to model what you want so that it somehow gives a benefit.
Abstracting a traffic flow simulation, for example, as a bunch of unrelated variables is messy. However, if you model each individual vehicle as an object, each object can handle its own internal state and it becomes simpler to deal with the idea of a "Vehicle" object than a bunch of variables that are not related to each other.
Abstract classes are more like Java's interfaces. They are meant to serve as a uniform programming "interface" within different internal parts of a program. By confining how objects can interact with other objects, you bring determinism to a program by confining how the program can program. It often leverages a langauge's type system to reduce the amount of unpredictable behavior or unwanted behavior that occurs within parts of a program by forcing it to conform to type constraints.
Some examples of abstraction: lambda calculus, objects, structs, constructors and destructors, polymorphism, etc.
Related
I have got the following data structure:
class Element {
std::string getType();
std::string getId();
virtual std::vector<Element*> getChildren();
}
class A : public Element {
void addA(const A *a);
void addB(const B *b);
void addC(const C *c);
std::vector<Element*> getChildren();
}
class B : public Element {
void addB(const B *b);
void addC(const C *c);
std::vector<Element*> getChildren();
}
class C : public Element {
int someActualValue;
}
/* The classes also have some kind of container to store the pointers and
* child elements. But let's keep the code short. */
The data structure is used to pruduce a acyclic directed graph. The C class acts as a "leaf" containing actual data for algebra-tasks. A and B hold other information, like names, types, rules, my favourite color and the weather forecast.
I want to program a feature, where a window pops up and you can navigate through an already existing structure. On the way i want to show the path the user took with some pretty flow chart, which is clickable to go back in the hierarchy. Based on the currently visited Graph-Node (which could be either A, B or C) some information has to be computed and displayed.
I thought i could just make a std::vector of type Element* and use the last item as the active element i work with. I thought that was a pretty nice approach, as it makes use of the inheritance that is already there and keeps the code i need quite small.
But i have a lot of situations like these:
Element* currentElement;
void addToCurrentElement(const C *c) {
if(A *a = dynamic_cast<A*>(currentElement)) {
//doSomething, if not, check if currentElement is actually a B
}
}
Or even worse:
vector<C*> filterForC's(A* parent) {
vector<Element*> eleVec = parent.getChildren();
vector<C*> retVec;
for(Element* e : eleVec) {
if (e.getType() == "class C") {
C *c = dynamic_cast<C*>(e);
retVec.append(c);
}
}
}
It definitely is object oriented. It definitely does use inheritance. But it feels like i just threw all the comfort OOP gives me over board and decided to use raw pointers and bitshifts again. Googling the subject, i found a lot of people saying casting up/down is bad design or bad practice. I totally believe that this is true, but I want to know why exactly. I can not change most of the code as it is part of a bigger project, but i want to know how to counter something like this situation when i design a program in the future.
My Questions:
Why is casting up/down considered bad design, besides the fact that it looks horrible?
Is a dynamic_cast slow?
Are there any rules of thumb how i can avoid a design like the one i explained above?
There are a lot of questions on dynamic_cast here on SO. I read only a few and also don't use that method often in my own code, so my answer reflects my opinion on this subject rather than my experience. Watch out.
(1.) Why is casting up/down considered bad design, besides the fact that it looks horrible?
(3.) Are there any rules of thumb how i can avoid a design like the one i explained above?
When reading the Stroustrup C++ FAQ, imo there is one central message: don't trust the people which say never use a certain tool. Rather, use the right tool for the task at hand.
Sometimes, however, two different tools can have a very similar purpose, and so is it here. You basically can recode any functionality using dynamic_cast through virtual functions.
So when is dynamic_cast the right tool? (see also What is the proper use case for dynamic_cast?)
One possible situation is when you have a base class which you can't extend, but nevertheless need to write overloaded-like code. With dynamic-casting you can do that non-invasive.
Another one is where you want to keep an interface, i.e. a pure virtual base class, and don't want to implement the corresponding virtual function in any derived class.
Often, however, you rather want to rely on virtual function -- if only for the reduced uglyness. Further it's more safe: a dynamic-cast can fail and terminate your program, a virtual function call (usually) won't.
Moreover, implemented in terms of pure functions, you will not forget to update it in all required places when you add a new derived class. On the other hand, a dynamic-cast can easily be forgotten in the code.
Virtual function version of your example
Here is the example again:
Element* currentElement;
void addToCurrentElement(const C *c) {
if(A *a = dynamic_cast<A*>(currentElement)) {
//doSomething, if not, check if currentElement is actually a B
}
}
To rewrite it, in your base add a (possibly pure) virtual functions add(A*), add(B*) and add(C*) which you overload in the derived classes.
struct A : public Element
{
virtual add(A* c) { /* do something for A */ }
virtual add(B* c) { /* do something for B */ }
virtual add(C* c) { /* do something for C */ }
};
//same for B, C, ...
and then call it in your function or possibly write a more concise function template
template<typename T>
void addToCurrentElement(T const* t)
{
currentElement->add(t);
}
I'd say this is the standard approach. As mentioned, the drawback could be that for pure virtual functions you require N*N overloads where maybe N might be enough (say, if only A::add requires a special treatment).
Other alternatives might use RTTI, the CRTP pattern, type erasure, and possibly more.
(2.) Is a dynamic_cast slow?
When considering what the majority of answers throughout the net state, then yes, a dynamic cast seems to be slow, see here for example.
Yet, I don't have practical experience to support or disconfirm this statement.
I am a decent procedural programmer, but I am a newbie to object orientation (I was trained as an engineer on good old Pascal and C). What I find particularly tricky is choosing one of a number of ways to achieve the same thing. This is especially true for C++, because its power allows you to do almost anything you like, even horrible things (I guess the power/responsibility adage is appropriate here).
I thought it might help me to run one particular case that I'm struggling with by the community, to get a feel for how people go about making these choices. What I'm looking for is both advice pertinent to my specific case, and also more general pointers (no pun intended). Here goes:
As an exercise, I am developing a simple simulator where a "geometric representation" can be of two types: a "circle", or a "polygon". Other parts of the simulator will then need to accept these representations, and potentially deal with them differently. I have come up with at least four different ways in which to do this. What are the merits/drawbacks/trade-offs of each?
A: Function Overloading
Declare Circle and Polygon as unrelated classes, and then overload each external method that requires a geometric representation.
B: Casting
Declare an enum GeometricRepresentationType {Circle, Polygon}. Declare an abstract GeometricRepresentation class and inherit Circle and Polygon from it. GeometricRepresentation has a virtual GetType() method that is implemented by Circle and Polygon. Methods then use GetType() and a switch statement to cast a GeometricRepresentation to the appropriate type.
C: Not Sure of an Appropriate Name
Declare an enum type and an abstract class as in B. In this class, also create functions Circle* ToCircle() {return NULL;} and Polygon* ToPolygon() {return NULL;}. Each derived class then overloads the respective function, returning this. Is this simply a re-invention of dynamic casting?
D: Bunch Them Together
Implement them as a single class having an enum member indicating which type the object is. The class has members that can store both representations. It is then up to external methods not to call silly functions (e.g. GetRadius() on a polygon or GetOrder() on a circle).
Here are a couple of design rules (of thumb) that I teach my OO students:
1) any time you would be tempted to create an enum to keep track of some mode in an object/class, you could (probably better) create a derived class for each enum value.
2) any time you write an if-statement about an object (or its current state/mode/whatever), you could (probably better) make a virtual function call to perform some (more abstract) operation, where the original then- or else-sub-statement is the body of the derived object's virtual function.
For example, instead of doing this:
if (obj->type() == CIRCLE) {
// do something circle-ish
double circum = M_PI * 2 * obj->getRadius();
cout << circum;
}
else if (obj->type() == POLY) {
// do something polygon-ish
double perim = 0;
for (int i=0; i<obj->segments(); i++)
perm += obj->getSegLength(i);
cout << perim;
}
Do this:
cout << obj->getPerimeter();
...
double Circle::getPerimeter() {
return M_PI * 2 * obj->getRadius();
}
double Poly::getPerimeter() {
double perim = 0;
for (int i=0; i<segments(); i++)
perm += getSegLength(i);
return perim;
}
In the case above it is pretty obvious what the "more abstract" idea is, perimeter. This will not always be the case. Sometimes it won't even have a good name, which is one of the reasons it's hard to "see". But, you can convert any if-statement into a virtual function call where the "if" part is replaced by the virtual-ness of the function.
In your case I definitely agree with the answer from Avi, you need a base/interface class and derived subclasses for Circle and Polygon.
Most probably you'll have common methods between the Polygon and Circle. I'd combine them both under an interface named Shape, for example(writing in java because it's fresher in my mind syntax-wise. But that's what I would use if I wrote c++ example. It's just been a while since I wrote c++):
public interface Shape {
public double getArea();
public double getCentroid();
public double getPerimiter();
}
And have both Polygon and Circle implement this interface:
public class Circle implements Shape {
// Implement the methods
}
public class Polygon implements Shape {
// Implement the methods
}
What are you getting:
You can always treat Shape as a generelized object with certain properties. You'll be able to add different Shape implementations in the future without changing the code that does something with Shape (unless you'll have something specific for a new Shape)
If you have methods that are exactly the same, you can replace the interface with abstract class and implement those (in C++ interface is just an abstract class with nothing implemented)
Most importantly (I'm emphesizing bullet #1) - you'll enjoy the power of polymorphism. If you use enums to declare your types, you'll one day have to change a lot of places in the code if you want to add new shape. Whereas, you won't have to change nothing for a new class the implements shape.
Go through a C++ tutorial for the basics, and read something like Stroustrup's "The C++ programming language" to learn how to use the language idiomatically.
Do not believe people telling you you'd have to learn OOP independent of the language. The dirty secret is that what each language understands as OOP is by no means even vaguely similar in some cases, so having a solid base in, e.g. Java, is not really a big help for C++; it goes so far that the language go just doesn't have classes at all. Besides, C++ is explicitly a multi-paradigm language, including procedural, object oriented, and generic programming in one package. You need to learn how to combine that effectively. It has been designed for maximal performance, which means some of the lower-bit stuff shows through, leaving many performance-related decisions in the hands of the programmer, where other languages just don't give options. C++ has a very extensive library of generic algorithms, learning to use those is required part of the curriculum.
Start small, so in a couple year's time you can chuckle fondly over the naïveté of your first attempts, instead of pulling your hair out.
Don't fret over "efficiency," use virtual member functions everywhere unless there is a compelling reason not to. Get a good grip on references and const. Getting an object design right is very hard, don't expect the first (or fifth) attempt to be the last.
First, a little background on OOP and how C++ and other languages like Java differ.
People tend to use object-oriented programming for several different purposes:
Generic programming: writing code that is generic; i.e. that works on any object or data that provides a specified interface, without needing to care about the implementation details.
Modularity and encapsulation: preventing different pieces of code from becoming too tightly coupled to each other (called "modularity"), by hiding irrelevant implementation details from its users.
It's another way to think about separation of concerns.
Static polymorphism: customizing a "default" implementation of some behavior for a specific class of objects while keeping the code modular, where the set of possible customizations is already known when you are writing your program.
(Note: if you didn't need to keep the code modular, then choosing behavior would be as simple as an if or switch, but then the original code would need to account for all of the possibilities.)
Dynamic polymorphism: like static polymorphism, except the set of possible customizations is not already known -- perhaps because you expect the user of the library to implement the particular behavior later, e.g. to make a plug-in for your program.
In Java, the same tools (inheritance and overriding) are used for solving basically all of these problems.
The upside is that there's only one way to solve all of the problems, so it's easier to learn.
The downside is a sometimes-but-not-always-negligible efficiency penalty: a solution that resolves concern #4 is more costly than one that only needs to resolve #3.
Now, enter C++.
C++ has different tools for dealing with all of these, and even when they use the same tool (such as inheritance) for the same problem, they are used in such different ways that they are effectively completely different solutions than the classic "inherit + override" you see in Java:
Generic programming: C++ templates are made for this. They're similar to Java's generics, but in fact Java's generics often require inheritance to be useful, whereas C++ templates have nothing to do with inheritance in general.
Modularity and encapsulation: C++ classes have public and private access modifiers, just like in Java. In this respect, the two languages are very similar.
Static polymorphism: Java has no way of solving this particular problem, and instead forces you to use a solution for #4, paying a penalty that you don't necessarily need to pay. C++, on the other hand, uses a combination of template classes and inheritance called CRTP to solve this problem. This type of inheritance is very different from the one for #4.
Dynamic polymorphism: C++ and Java both allow for inheritance and function overriding, and are similar in this respect.
Now, back to your question. How would I solve this problem?
It follows from the above discussion that inheritance isn't the single hammer meant for all nails.
Probably the best way (although perhaps the most complicated way) is to use #3 for this task.
If need be, you can implement #4 on top of it for the classes that need it, without affecting other classes.
You declare a class called Shape and define the base functionality:
class Graphics; // Assume already declared
template<class Derived = void>
class Shape; // Declare the shape class
template<>
class Shape<> // Specialize Shape<void> as base functionality
{
Color _color;
public:
// Data and functionality for all shapes goes here
// if it does NOT depend on the particular shape
Color color() const { return this->_color; }
void color(Color value) { this->_color = value; }
};
Then you define the generic functionality:
template<class Derived>
class Shape : public Shape<> // Inherit base functionality
{
public:
// You're not required to actually declare these,
// but do it for the sake of documentation.
// The subclasses are expected to define these.
size_t vertices() const;
Point vertex(size_t vertex_index) const;
void draw_center(Graphics &g) const { g.draw_pixel(shape.center()); }
void draw_outline()
{
Derived &me = static_cast<Derived &>(*this); // My subclass type
Point p1 = me.vertex(0);
for (size_t i = 1; i < me.vertices(); ++i)
{
Point p2 = me.vertex(1);
g.draw_line(p1, p2);
p1 = p2;
}
}
Point center() const // Uses the methods above from the subclass
{
Derived &me = static_cast<Derived &>(*this); // My subclass type
Point center = Point();
for (size_t i = 0; i < me.vertices(); ++i)
{ center += (center * i + me.vertex(i)) / (i + 1); }
return center;
}
};
Once you do that, you can define new shapes:
template<>
class Square : public Shape<Square>
{
Point _top_left, _bottom_right;
public:
size_t vertices() const { return 4; }
Point vertex(size_t vertex_index) const
{
switch (vertex_index)
{
case 0: return this->_top_left;
case 1: return Point(this->_bottom_right.x, this->_top_left.y);
case 2: return this->_bottom_right;
case 3: return Point(this->_top_left.x, this->_bottom_right.y);
default: throw std::out_of_range("invalid vertex");
}
}
// No need to define center() -- it is already available!
};
This is probably the best method since you most likely already know all possible shapes at compile-time (i.e. you don't expect the user will write a plug-in to define his own shape), and thus don't need any of the whole deal with virtual. Yet it keeps the code modular and separates the concerns of the different shapes, effectively giving you the same benefits as a dynamic-polymorphism approach.
(It is also the most efficient option at run-time, at the cost of being a bit more complicated at compile-time.)
Hope this helps.
I have a class Feature with a pure virtual method.
class Feature {
public:
virtual ~Feature() {}
virtual const float getValue(const vector<int>& v) const = 0;
};
This class is implemented by several classes, for example FeatureA and FeatureB.
A separate class Computer (simplified) uses the getValue method to do some computation.
class Computer {
public:
const float compute(const vector<Feature*>& features, const vector<int>& v) {
float res = 0;
for (int i = 0; i < features.size(); ++i) {
res += features[i]->getValue(v);
}
return res;
}
};
Now, I am would like to implement FeatureC but I realize that I need additional information in the getValue method. The method in FeatureC looks like
const float getValue(const vector<int>& v, const vector<int>& additionalInfo) const;
I can of course modify the signature of getValue in Feature, FeatureA, FeatureB to take additionalInfo as a parameter and also add additionalInfo as a parameter in the compute method. But then I may have to modify all those signatures again later if I want to implement FeatureD that needs even more additional info. I wonder if there is a more elegant solution to this or if there is a known design pattern that you can point me to for further reading.
You have at least two options:
Instead of passing the single vector to getValue(), pass a struct. In this struct you can put the vector today, and more data tomorrow. Of course, if some concrete runs of your program don't need the extra fields, the need to compute them might be wasteful. But it will impose no performance penalty if you always need to compute all the data anyway (i.e. if there will always be one FeatureC).
Pass to getValue() a reference to an object having methods to get the necessary data. This object could be the Computer itself, or some simpler proxy. Then the getValue() implementations can request exactly what they need, and it can be lazily computed. The laziness will eliminate wasted computations in some cases, but the overall structure of doing it this way will impose some small constant overhead due to having to call (possibly virtual) functions to get the various data.
Requiring the user of your Feature class hierarchy to call different methods based on class defeats polymorphism. Once you start doing dynamic_cast<>() you know you should be rethinking your design.
If a subclass requires information that it can only get from its caller, you should change the getValue() method to take an additionalInfo argument, and simply ignore that information in classes where it doesn't matter.
If FeatureC can get additionalInfo by calling another class or function, that's usually a better approach, as it limits the number of classes that need to know about it. Perhaps the data is available from an object which FeatureC is given access to via its constructor, or from a singleton object, or it can be calculated by calling a function. Finding the best approach requires a bit more knowledge about the case.
This problem is addressed in item 39 of C++ Coding Standards (Sutter, Alexandrescu), which is titled "Consider making virtual functions nonpublic, and public functions nonvirtual."
In particular, one of the motivations for following the Non-Virtual-Interface design pattern (this is what the item is all about) is stated as
Each interface can take its natural shape: When we separate the public interface
from the customization interface, each can easily take the form it naturally
wants to take instead of trying to find a compromise that forces them to look
identical. Often, the two interfaces want different numbers of functions and/or
different parameters; [...]
This is particularly useful
In base classes with a high cost of change
Another design pattern which is very useful in this case is the Visitor pattern. As for the NVI it applies when base classes (as well as the whole hierarchy) have a high cost of change. You can find plenty of discussion about this design pattern, I suggest you to read the related chapter in Modern C++ (Alexandrescu), which (on the side) gives you a great insight on how to use the (very easy to use) Visitor facilities in loki
I suggest for you to read all of this material and then edit the question so that we can give you a better answer. We can come up with all sort of solutions (e.g. use an additional method which gives the class the additional parameters, if needed) which might well not suit your case.
Try to address the following questions:
would a template-based solution fit the problem?
would it be feasible to add a new layer of indirection when calling the function?
would a "push argument"-"push argument"-...-"push argument"-"call function" method be of help? (this might seem very odd at first, but
think to something like "cout << arg << arg << arg << endl", where
"endl" is the "call function")
how do you intend to distinguish how to call the function in Computer::compute?
Now that we had some "theory", let's aim for the practice using the Visitor pattern:
#include <iostream>
using namespace std;
class FeatureA;
class FeatureB;
class Computer{
public:
int visitA(FeatureA& f);
int visitB(FeatureB& f);
};
class Feature {
public:
virtual ~Feature() {}
virtual int accept(Computer&) = 0;
};
class FeatureA{
public:
int accept(Computer& c){
return c.visitA(*this);
}
int compute(int a){
return a+1;
}
};
class FeatureB{
public:
int accept(Computer& c){
return c.visitB(*this);
}
int compute(int a, int b){
return a+b;
}
};
int Computer::visitA(FeatureA& f){
return f.compute(1);
}
int Computer::visitB(FeatureB& f){
return f.compute(1, 2);
}
int main()
{
FeatureA a;
FeatureB b;
Computer c;
cout << a.accept(c) << '\t' << b.accept(c) << endl;
}
You can try this code here.
This is a rough implementation of the Visitor pattern which, as you can see, solves your problem. I strongly advice you not to try to implement it this way, there are obvious dependency problems which can be solved by means of a refinement called the Acyclic Visitor. It is already implemented in Loki, so there is no need to worry about implementing it.
Apart from implementation, as you can see you are not relying on type switches (which, as somebody else pointed out, you should avoid whenever possible) and you are not requiring the classes to have any particular interface (e.g. one argument for the compute function). Moreover, if the visitor class is a hierarchy (make Computer a base class in the example), you won't need to add any new function to the hierarchy when you want to add functionalities of this sort.
If you don't like the visitA, visitB, ... "pattern", worry not: this is just a trivial implementation and you don't need that. Basically, in a real implementation you use template specialization of a visit function.
Hope this helped, I had put a lot of effort into it :)
Virtual functions, to work correctly, needs to have exactly the same "signature" (same parameters and same return type). Otherwise, you just get a "new member function", which isn't what you want.
The real question here is "how does the calling code know it needs the extra information".
You can solve this in a few different ways - the first one is to always pass in const vector <int>& additionalInfo, whether it's needed or not.
If that's not possible, because there isn't any additionalInfo except for in the case of FeatureC, you could have an "optional" parameter - which means use a pointer to vector (vector<int>* additionalInfo), which is NULL when the value is not available.
Of course if additionalInfo is a value that is something that can be stored in the FeatureC class, then that would also work.
Another option is to extend the base class Feature to have two more options:
class Feature {
public:
virtual ~Feature() {}
virtual const float getValue(const vector<int>& v) const = 0;
virtual const float getValue(const vector<int>& v, const vector<int>& additionalInfo) { return -1.0; };
virtual bool useAdditionalInfo() { return false; }
};
and then make your loop something like this:
for (int i = 0; i < features.size(); ++i) {
if (features[i]->useAdditionalInfo())
{
res += features[i]->getValue(v, additionalInfo);
}
else
{
res += features[i]->getValue(v);
}
}
What are the advantages/disadvantages of the two techniques in comparison ? And more importantly: Why and when should one be used over the other ? Is it just a matter of personal taste/preference ?
To the best of my abilities, I haven't found another post that explicitly addresses my question. Among many questions regarding the actual use of polymorphism and/or type-erasure, the following seems to be closest, or so it seemed, but it doesn't really address my question either:
C++ -& CRTP . Type erasure vs polymorphism
Please, note that I very well understand both techniques. To this end, I provide a simple, self-contained, working example below, which I'm happy to remove, if it is felt unnecessary. However, the example should clarify what the two techniques mean with respect to my question. I'm not interested in discussing nomenclatures. Also, I know the difference between compile- and run-time polymorphism, though I wouldn't consider this to be relevant to the question. Note that my interest is less in performance-differences, if there are any. However, if there was a striking argument for one or the other based on performance, I'd be curious to read it. In particular, I would like to hear about concrete examples (no code) that would really only work with one of the two approaches.
Looking at the example below, one primary difference is the memory-management, which for polymorphism remains on the user-side, and for type-erasure is neatly tucked away requiring some reference-counting (or boost). Having said that, depending on the usage scenarios, the situation might be improved for the polymorphism-example by using smart-pointers with the vector (?), though for arbitrary cases this may very well turn out to be impractical (?). Another aspect, potentially in favor of type-erasure, may be the independence of a common interface, but why exactly would that be an advantage (?).
The code as given below was tested (compiled & run) with MS VisualStudio 2008 by simply putting all of the following code-blocks into a single source-file. It should also compile with gcc on Linux, or so I hope/assume, because I see no reason why not (?) :-) I have split/divided the code here for clarity.
These header-files should be sufficient, right (?).
#include <iostream>
#include <vector>
#include <string>
Simple reference-counting to avoid boost (or other) dependencies. This class is only used in the type-erasure-example below.
class RefCount
{
RefCount( const RefCount& );
RefCount& operator= ( const RefCount& );
int m_refCount;
public:
RefCount() : m_refCount(1) {}
void Increment() { ++m_refCount; }
int Decrement() { return --m_refCount; }
};
This is the simple type-erasure example/illustration. It was copied and modified in part from the following article. Mainly I have tried to make it as clear and straightforward as possible.
http://www.cplusplus.com/articles/oz18T05o/
class Object {
struct ObjectInterface {
virtual ~ObjectInterface() {}
virtual std::string GetSomeText() const = 0;
};
template< typename T > struct ObjectModel : ObjectInterface {
ObjectModel( const T& t ) : m_object( t ) {}
virtual ~ObjectModel() {}
virtual std::string GetSomeText() const { return m_object.GetSomeText(); }
T m_object;
};
void DecrementRefCount() {
if( mp_refCount->Decrement()==0 ) {
delete mp_refCount; delete mp_objectInterface;
mp_refCount = NULL; mp_objectInterface = NULL;
}
}
Object& operator= ( const Object& );
ObjectInterface *mp_objectInterface;
RefCount *mp_refCount;
public:
template< typename T > Object( const T& obj )
: mp_objectInterface( new ObjectModel<T>( obj ) ), mp_refCount( new RefCount ) {}
~Object() { DecrementRefCount(); }
std::string GetSomeText() const { return mp_objectInterface->GetSomeText(); }
Object( const Object &obj ) {
obj.mp_refCount->Increment(); mp_refCount = obj.mp_refCount;
mp_objectInterface = obj.mp_objectInterface;
}
};
struct MyObject1 { std::string GetSomeText() const { return "MyObject1"; } };
struct MyObject2 { std::string GetSomeText() const { return "MyObject2"; } };
void UseTypeErasure() {
typedef std::vector<Object> ObjVect;
typedef ObjVect::const_iterator ObjVectIter;
ObjVect objVect;
objVect.push_back( Object( MyObject1() ) );
objVect.push_back( Object( MyObject2() ) );
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
std::cout << iter->GetSomeText();
}
As far as I'm concerned, this seems to achieve pretty much the same using polymorphism, or maybe not (?).
struct ObjectInterface {
virtual ~ObjectInterface() {}
virtual std::string GetSomeText() const = 0;
};
struct MyObject3 : public ObjectInterface {
std::string GetSomeText() const { return "MyObject3"; } };
struct MyObject4 : public ObjectInterface {
std::string GetSomeText() const { return "MyObject4"; } };
void UsePolymorphism() {
typedef std::vector<ObjectInterface*> ObjVect;
typedef ObjVect::const_iterator ObjVectIter;
ObjVect objVect;
objVect.push_back( new MyObject3 );
objVect.push_back( new MyObject4 );
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
std::cout << (*iter)->GetSomeText();
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
delete *iter;
}
And finally for testing all of the above together.
int main() {
UseTypeErasure();
UsePolymorphism();
return(0);
}
C++ style virtual method based polymorphism:
You have to use classes to hold your data.
Every class has to be built with your particular kind of polymorphism in mind.
Every class has a common binary-level dependency, which restricts how the
compiler creates the instance of each class.
The data you are abstracting must explicitly describe an interface that describes
your needs.
C++ style template based type erasure (with virtual method based polymorphism doing the erasure):
You have to use template to talk about your data.
Each chunk of data you are working on may be completely unrelated to other options.
The type erasure work is done within public header files, which bloats compile time.
Each type erased has its own template instantiated, which can bloat binary size.
The data you are abstracting need not be written as being directly dependent on your needs.
Now, which is better? Well, that depends if the above things are good or bad in your particular situation.
As an explicit example, std::function<...> uses type erasure which allows it to take function pointers, function references, output of a whole pile of template-based functions that generate types at compile time, myraids of functors which have an operator(), and lambdas. All of these types are unrelated to one another. And because they aren't tied to having a virtual operator(), when they are used outside of the std::function context the abstraction they represent can be compiled away. You couldn't do this without type erasure, and you probably wouldn't want to.
On the other hand, just because a class has a method called DoFoo, doesn't mean that they all do the same thing. With polymorphism, it isn't just any DoFoo you are calling, but the DoFoo from a particular interface.
As for your sample code... your GetSomeText should be virtual ... override in the polymorphism case.
There is no need to reference count just because you are using type erasure. There is no need not to use reference counting just because you are using polymorphsm.
Your Object could wrap T*s like how you stored vectors of raw pointers in the other case, with manual destruction of their contents (equivalent to having to call delete). Your Object could wrap a std::shared_ptr<T>, and in the other case you could have vector of std::shared_ptr<T>. Your Object could contain a std::unique_ptr<T>, equivalent to having a vector of std::unique_ptr<T> in the other case. Your Object's ObjectModel could extract copy constructors and assignment operators from the T and expose them to Object, allowing full-on value semantics for your Object, which corresponds to the a vector of T in your polymorphism case.
Here's one view: The question seems to ask how one should choose between late binding ("runtime polymorphism") and early binding ("compile-time polymorphism").
As KerrekSB points out in his comments, there are some things you can do with late binding that it just isn't realistic to do with early binding. Many uses of the Strategy pattern (decoding network I/O) or the Abstract Factory pattern (runtime-selected class factories) fall into this category.
If both approaches are viable, then choosing is a matter of the trade offs involved. In C++ applications, the main tradeoffs I see between early and late binding are implementation maintainability, binary size, and performance.
There are at least some people who feel that C++ templates in any shape or form are impossible to comprehend. Or possibly have some other, less dramatic reservation with templates. C++ templates have many little gotchas ("when do I need to use the 'typename' and 'template' keywords?"), and non-obvious tricks (SFINAE comes to mind).
Another tradeoff is optimization. When you bind early, you give the compiler more information about your program, and so it can (potentially) do a better job optimizing. When you bind late, the compiler (probably) doesn't know ahead of time as much information -- some of that information may be in other compilation units, and so the optimizer can't do as much.
Another tradeoff is program size. In C++ at least, using "compile-time polymorphism" sometimes balloons binary size, as the compiler creates, optimizes, and emits different code for each used specialization. In contrast, when binding late, there's only one code path.
It's interesting to compare the same tradeoff being made in a different context. Take web applications, where one uses (some type of) polymorphism to deal with differences between browsers, and possibly for internationalization (i18n)/localization. Now, a hand-written JavaScript web application would likely use what amounts to late binding here, by having methods which detect capabilities at runtime to figure out what to do. Libraries like jQuery take this tack.
Another approach is to write different code for each possible browser/i18n possibility. While this sounds absurd, it is far from unheard of. The Google Web Toolkit uses this approach. GWT has its "deferred binding" mechanism, used to specialize the compiler's output to different browsers and different localizations. GWT's "deferred binding" mechanism uses early binding: The GWT Java-to-JavaScript compiler figures out all possible ways the polymorphism might be needed, and spits out an entirely different "binary" for each.
The tradeoffs are similar. Wrapping your head around how you extend GWT using deferred binding can be a headache and a half; Having knowledge at compile time allows GWT's compiler to optimize each specialization separately, possibly yielding better performance, and smaller size for each specialization; The whole of a GWT application can end up being many times the size of a comparable jQuery application, due to all of the precompiled specializations.
One benefit to runtime generics that no-one here has mentioned (?) is the possibility for code that is generated and injected into a running application, to use the same List, Hashmap / Dictionary etc. that everything else in that application is already using. Why you'd want to do that, is another question.
I have a class hierarchy with the following three classes:
template<int pdim >
class Function
{
virtual double operator()( const Point<pdim>& x) const = 0;
};
Which is a function in pdim-dimensional space, returning doubles.
template<int pdim, int ldim >
class NodeFunction
{
virtual double operator()( const Node<pdim,ldim>& pnode, const Point<ldim>& xLoc) const = 0;
};
Which is a function from the ldim-dimensional local space of a node in pdim-dimensional space.
template<int pdim, int ldim, int meshdim >
class PNodeFunction
{
virtual double operator()( const PNode<pdim,ldim,meshdim>& pnode, const Point<ldim>& xLoc) const = 0;
};
Reason 1 for this design: a NodeFunction is more general than a Function. It can always map the local ldim-point point to a pdim-point. E.g an edge (Node with ldim=1) maps the interval [0,1] into pdim-dimensional physical space. That is why every Function is a NodeFunction. The NodeFunction is more general as the NodeFunction is allowed to query the Node for attributes.
Reason 2 for this design: a PNodeFunction is more general than a NodeFunction. Exactly one Node is accociated to every PNode (not vice versa). That is why every PNodeFunction is a NodeFunction. The PNodeFunction is more general as it also has all the context of the PNode which is part of a Mesh (thus it knows all its parents, neighbours, ...).
Summary: Every Function<pdim> is a NodeFunction<pdim, ldim> for any parameter of ldim. Every NodeFunction<pdim, ldim> is a NodeFunction<pdim, ldim, meshdim> for any parameter of meshdim.
Question: What is the best way to express this in C++, such that I can use Function in place of NodeFunction / PNodeFunction, such that the code is fast (it is a high performance computing code), such that the Code works for
The template parameters are not completely independent but rather dependend on each other:
- pdim=1,2,3 (main interest) but it is nice if it works also for values of pdim up to 7.
- 'ldim=0,1,...,pdim'
- 'meshdim=ldim,ldim+1,...,pdim'
To consider the performance, note that obly a few functions are created in the program, but their operator() is called many times.
Variants
I thought about a few ways to implement this (I currently implemented Variant 1). I wrote it down here so that you can tell me about the advanage and disadvantage of these approaches.
Variant 1
Implement the above described inheritance A<dim> inherits from B<dim,dim2> via a helper template Arec<dim,dim2>. In pseudo Code this is
class A<dim> : public Arec<dim,dim>;
class Arec<dim,dim2> : public Arec<dim,dim2-1>, public B<dim,dim2>;
class Arec<dim,0> : public B<dim,dim2>;
This is applied both to inherit Function from NodeFunction and NodeFunction from PNodeFunction. As NodeFunction inherits roughly O(pdim^2) times from PNodeFunction how does this scale? Is this huge virtual table bad?
Note: In fact every Function should also inherit from VerboseObject, which allows me to print debugging information about the function to e.g. std::cout. I do this by virtually inheritung PNodeFunction from VerboseObject. How will this impact the performance? This should increase the time to construct a Function and to print the debug information, but not the time for operator(), right?
Variant 2
Don't express the inheritance in C++, e.g. A<dim> doesn inherit from B<dim,dim2> bur rather there is a function to convert the two
class AHolder<dim,dim2> : public B<dim, dim> {
}
std::shared_pointer< AHolder<dim,dim2> > interpretAasB( std::shared_pointer< AHolder<dim> >)
[...]
This has the disadvanate that I can no longer use Function<dim> in place of NodeFunction<dim> or PNodeFunction<dim>.
Variant 3
What is your prefered way to implement this?
I don't comprehend you problem very well; that might be because I lack specific knowledge of the problem domain.
Anyway it seems like you want to generate a hierarchy of classes, with Function (most derived class) at the bottom, and PNodeFunction at the top (least derived class).
For that I can only recommend Alexandrescu's Modern C++ design book, especially the chapter on hierarchy generators.
There is an open source library stemming from the book called Loki.
Here's the part that might interest you.
Going the generic meta-programming way might be the hardest but I think it will result in ease of use once setup, and possibly increased performance (that is always to be verified by the profiler) compared to virtual inheritance.
In any case I strongly recommend not inheriting from the Verbose object for logging, but rather having a separate singleton logging class.
That way you don't need the extra space in the class hierarchy to store a logging object.
You could have only the least derived class inherit from the Verbose object but your function classes are not logging objects; they use a logging object (I may be a bit pedantic here). The other problem is if you inherit multiple times from that base class, you'll end up with multiple copies of the logging object and have to use virtual inheritance to solve it.