Public virtual method overridden as private. Generalization/specialization/Liskov principles violation? - c++

As in Private function member called outside of class, one can write the following code:
#include <iostream>
class A {
public:
virtual void f() { std::cout << "A::f()"; }
};
class B : public A {
private:
void f() override { std::cout << "B::f()"; }
};
void g(A &g) { g.f(); }
int main() {
A a;
g(a);
a.f();
B b;
g(b);
b.f(); // compilation failure
}
Of course, the compiler refuses to compile the last line, because the static analysis of the code reveals that B::f() is defined but private.
What seriously troubles me is the relation with the conceptual generalization/specialization. It is generally considered that you must be able to manipulate an instance of a subtype at least the same way you manipulate an instance of the super type.
This is the basis of the Liskov's substitution principle. In the given example, that is respected when g() is called either with an argument of type A or one of type B. But the last line is not accepted, and it seems that in such a case the substitution principle is violated in some way (consider a call-by-name as in a macro-definition #define h(x) (x.f())).
Even if one may consider that the Liskov's principle is not violated (macros are not real part of the language, so ok), the fact that the last line gives a compile-time error, at least means that objects of type B can't be manipulated as A's can be. So that B is not a specialization of A even if the derivation is public.
Thus in C++, using a public derivation does not guarantee that you are effectively implementing a specialization. You need to consider more properties of the code to be sure that you have a ‘‘correct’’ specialization.
Am I wrong ? Is someone able to give me a justification for it ? I would like a good semantic argument, I mean at least something much more elaborate than Stroustrup's ones like ‘‘C++ tries not to constrains you, you are free to use it if you want and not if you don't want’’. I think that a language need to be founded on a reasonable model, not on a huge list of possible tricks.

Related

Why does the member function name lookup stay in the parent class?

Motivation, if it helps: : I have struct member functions that are radial-basis-function kernels. They are called 1e06 x 15 x 1e05 times in a numerical simulation. Counting on devirtualization to inline virtual functions is not something I want to do for that many function calls. Also, the structs (RBF kernels) are already used as template parameters of a larger interpolation class.
Minimal working example
I have a function g() that is always the same, and I want to reuse it, so I pack it in the base class.
The function g() calls a function f() that is different in derived classes.
I don't want to use virtual functions to resolve the function names at runtime, because this incurs additional costs (I measured it in my code, it has an effect).
Here is the example:
#include <iostream>
struct A
{
double f() const { return 0; };
void g() const
{
std::cout << f() << std::endl;
}
};
struct B : private A
{
using A::g;
double f() const { return 1; };
};
struct C : private A
{
using A::g;
double f() const { return 2; };
};
int main()
{
B b;
C c;
b.g(); // Outputs 0 instead of 1
c.g(); // Outputs 0 instead of 2
}
I expected the name resolution mechanism to figure out I want to use "A::g()", but then to return to "B" or "C" to resolve the "f()" function. Something along the lines: "when I know a type at compile time, I will try to resolve all names in this type first, and do a name lookup from objects/parents something is missing, then go back to the type I was called from". However, it seems to figure out "A::g()" is used, then it sits in "A" and just picks "A::f()", even though the actual call to "g()" came from "B" and "C".
This can be solved using virtual functions, but I don't understand and would like to know the reasoning behind the name lookup sticking to the parent class when types are known at compile time.
How can I get this to work without virtual functions?
This is a standard task for the CRTP. The base class needs to know what the static type of the object is, and then it just casts itself to that.
template<typename Derived>
struct A
{
void g() const
{
cout << static_cast<Derived const*>(this)->f() << endl;
}
};
struct B : A<B>
{
using A::g;
double f() const { return 1; };
};
Also, responding to a comment you wrote, (which is maybe your real question?),
can you tell me what is the reasoning for the name lookup to stick to the base class, instead of returning to the derived class once it looked up g()?
Because classes are intended to be used for object-oriented programming, not for code reuse. The programmer of A needs to be able to understand what their code is doing, which means subclasses shouldn't be able to arbitrarily override functionality from the base class. That's what virtual is, really: A giving its subclasses permission to override that specific member. Anything that A hasn't opted-in to that for, they should be able to rely on.
Consider in your example: What if the author of B later added an integer member which happened to be called endl? Should that break A? Should B have to care about all the private member names of A? And if the author of A wants to add a member variable, should they be able to do so in a way that doesn't potentially break some subclass? (The answers are "no", "no", and "yes".)

C++ Relaying Member Functions in Nested Classes?

I work at a manufacturing plant that uses a large C++ project to automate the manufacturing process.
I see a certain practice all over the place that just seems to make code unnecessarily long and I was wondering if there is a specific reason this practice is used.
See below for a simplified example that demonstrates this practice.
First file:
class A {
private:
int a;
public:
int get_a()
{ return a; }
void set_a(int arg)
{ a = arg; }
};
Second file:
class B {
private:
int b;
public:
int get_b()
{ return b; }
void set_b(int arg)
{ b = arg; }
};
Third file:
class C {
private:
A obj1;
B obj2;
public:
int get_a()
{ return obj1.get_a(); }
int get_b()
{ return obj2.get_b(); }
void set_a(int arg)
{ obj1.set_a(arg); }
void set_b(int arg)
{ obj2.set_b(arg); }
};
It seems to me like a slight change in design philosophy could have drastically reduced the amount of code in the third file. Something like this:
class C {
public:
A obj1;
B obj2;
};
Having obj1 and obj2 be public members in the C class does not seem to be unsafe, because the A and B classes each safely handle the getting and setting of their own member variables.
The only disadvantage I can think of to doing it this way is that any instances of the C class that calls a function would need to do something like obj.obj1.get_a() instead of just obj.get_a() but this seems like much less of an inconvenience than having private A and B object instances in the C class and then manually needing to "relay" all of their member functions.
I realize for this simple example, it is not much extra code, but for this large project that my company uses, it adds literally tens of thousands of lines of code.
Am I missing something?
There can be many reasons. One is the following:
Imagine you write a function that does something with the member a. You want the same code to accept an A as well as a C. Your function could look like this:
template <typename T>
void foo(T& t) {
std::cout << " a = " << t.get_a();
}
This would not work with your C because it has a different interface.
Encapsulation has its benefits, but I agree with you that encapsulation for the sake of encapsulation very often leads to much more code and often to nothing else than that.
In general, forcing calling code to write something like obj.obj1.get_a() is discouraged, because it reveals implementations details. If you ever change e.g the type of a then your C has no control whatsoever on that change. On the other hand if in the origninal code a changes from int to double then C can decide whether to keep the int interface and do some conversion (if applicable) or to change its interface.
It does add a little extra code, but the important thing is your interface. A class has a responsibility, and the members it holds are implementation details. If you expose internal objects and force users to "get the object, then make calls on it" you are coupling the caller to the implementation more than if you just provide an interface that does the job for the user. As an analogy, [borrowed from wikipedia] when one wants a dog to walk, one does not command the dog's legs to walk directly; instead one commands the dog which then commands its own legs.
Law of Demeter / Wikipedia
More formally, the Law of Demeter for functions requires that a method m of an object O may only invoke the methods of the following kinds of objects:
O itself
m's parameters
Any objects created/instantiated within m
O's direct component objects
A global variable, accessible by O, in the scope of m
In particular, an object should avoid invoking methods of a member object returned by another method. For many modern object oriented languages that use a dot as field identifier, the law can be stated simply as "use only one dot". That is, the code a.b.c.Method() breaks the law where a.b.Method() does not.

C++ : Access a sub-object's methods inside an object

I am starting to code bigger objects, having other objects inside them.
Sometimes, I need to be able to call methods of a sub-object from outside the class of the object containing it, from the main() function for example.
So far I was using getters and setters as I learned.
This would give something like the following code:
class Object {
public:
bool Object::SetSubMode(int mode);
int Object::GetSubMode();
private:
SubObject subObject;
};
class SubObject {
public:
bool SubObject::SetMode(int mode);
int SubObject::GetMode();
private:
int m_mode(0);
};
bool Object::SetSubMode(int mode) { return subObject.SetMode(mode); }
int Object::GetSubMode() { return subObject.GetMode(); }
bool SubObject::SetMode(int mode) { m_mode = mode; return true; }
int SubObject::GetMode() { return m_mode; }
This feels very sub-optimal, forces me to write (ugly) code for every method that needs to be accessible from outside. I would like to be able to do something as simple as Object->SubObject->Method(param);
I thought of a simple solution: putting the sub-object as public in my object.
This way I should be able to simply access its methods from outside.
The problem is that when I learned object oriented programming, I was told that putting anything in public besides methods was blasphemy and I do not want to start taking bad coding habits.
Another solution I came across during my research before posting here is to add a public pointer to the sub-object perhaps?
How can I access a sub-object's methods in a neat way?
Is it allowed / a good practice to put an object inside a class as public to access its methods? How to do without that otherwise?
Thank you very much for your help on this.
The problem with both a pointer and public member object is you've just removed the information hiding. Your code is now more brittle because it all "knows" that you've implemented object Car with 4 object Wheel members. Instead of calling a Car function that hides the details like this:
Car->SetRPM(200); // hiding
You want to directly start spinning the Wheels like this:
Car.wheel_1.SetRPM(200); // not hiding! and brittle!
Car.wheel_2.SetRPM(200);
And what if you change the internals of the class? The above might now be broken and need to be changed to:
Car.wheel[0].SetRPM(200); // not hiding!
Car.wheel[1].SetRPM(200);
Also, for your Car you can say SetRPM() and the class figures out whether it is front wheel drive, rear wheel drive, or all wheel drive. If you talk to the wheel members directly that implementation detail is no longer hidden.
Sometimes you do need direct access to a class's members, but one goal in creating the class was to encapsulate and hide implementation details from the caller.
Note that you can have Set and Get operations that update more than one bit of member data in the class, but ideally those operations make sense for the Car itself and not specific member objects.
I was told that putting anything in public besides methods was blasphemy
Blanket statements like this are dangerous; There are pros and cons to each style that you must take into consideration, but an outright ban on public members is a bad idea IMO.
The main problem with having public members is that it exposes implementation details that might be better hidden. For example, let's say you are writing some library:
struct A {
struct B {
void foo() {...}
};
B b;
};
A a;
a.b.foo();
Now a few years down you decide that you want to change the behavior of A depending on the context; maybe you want to make it run differently in a test environment, maybe you want to load from a different data source, etc.. Heck, maybe you just decide the name of the member b is not descriptive enough. But because b is public, you can't change the behavior of A without breaking client code.
struct A {
struct B {
void foo() {...}
};
struct C {
void foo() {...}
};
B b;
C c;
};
A a;
a.c.foo(); // Uh oh, everywhere that uses b needs to change!
Now if you were to let A wrap the implementation:
class A {
public:
foo() {
if (TESTING) {
b.foo();
} else {
c.foo();
}
private:
struct B {
void foo() {...}
};
struct C {
void foo() {...}
};
B b;
C c;
};
A a;
a.foo(); // I don't care how foo is implemented, it just works
(This is not a perfect example, but you get the idea.)
Of course, the disadvantage here is that it requires a lot of extra boilerplate, like you have already noticed. So basically, the question is "do you expect the implementation details to change in the future, and if so, will it cost more to add boilerplate now, or to refactor every call later?" And if you are writing a library used by external users, then "refactor every call" turns into "break all client code and force them to refactor", which will make a lot of people very upset.
Of course instead of writing forwarding functions for each function in SubObject, you could just add a getter for subObject:
const SubObject& getSubObject() { return subObject; }
// ...
object.getSubObject().setMode(0);
Which suffers from some of the same problems as above, although it is a bit easier to work around because the SubObject interface is not necessarily tied to the implementation.
All that said, I think there are certainly times where public members are the correct choice. For example, simple structs whose primary purpose is to act as the input for another function, or who just get a bundle of data from point A to point B. Sometimes all that boilerplate is really overkill.

Inline a virtual function in a method when the object has value semantics

Consider the following code with a template method design pattern:
class A {
public:
void templateMethod() {
doSomething();
}
private:
virtual void doSomething() {
std::cout << “42\n”;
}
};
class B : public A {
private:
void doSomething() override {
std::cout << “43\n”;
}
};
int main() {
// case 1
A a; // value semantics
a.templateMethod(); // knows at compile time that A::doSomething() must be called
// case 2
B b; // value semantics
b.templateMethod(); // knows at compile time that B::doSomething() must be called
// case 3
A& a_or_b_ref = runtime_condition() ? a : b; // ref semantics
a_or_b_ref.templateMethod(); // does not know which doSomething() at compile time, a virtual call is needed
return 0;
}
I am wondering if the compiler is able to inline/unvirtualize the “doSomething()” member function in case 1 and 2.
This is possible if it creates 3 different pieces of binary code for templateMethod(): one with no inline, and 2 with either A::doSomething() or B::doSomething() inlined (that must be called respectively in cases 3, 1 and 2)
Do you know if this optimization is required by the standard, or else if any compiler implements it ?
I know that I can achive the same kind of effect with a CRT pattern and no virtual, but the intent will be less clear.
The standard does not require optimisations in general (occasionally it goes out of its way to allow them); it specifies the outcome and it is up to the compiler to figure out how best to achieve it.
In all three cases I would expect templateMethod to be inlined. The compiler is then free to perform further optimisations; in the first two cases it knows the dynamic type of this and so can generate a non-virtual call for doSomething. (I'd then expect it to inline those calls.)
Have a look at the generated code and see for yourself.
The optimisation is a problem of the compiler not of the standard. It would be a major bug if an optimisation was leading to a non respect or the princips of virtual functions.
So in the 3rd case :
// case 3
A& b_ref = b; // ref semantics
b_ref.templateMethod();
the actual object is a B, and the actual function called must be the one defined in B class, whatever the reference of pointer used is.
And my compiler displays correctly 43 - has it displayed anything else I would have changed compiler immediately ...

boost::bind with protected members & context

In the below code, there are two "equivalent" calls to std::for_each using boost:bind expressions. The indicated line compiles, the indicated failing line fails. The best explanation I can find in the standard amounts to "because we said so". I'm looking for "why the standard indicates this behavior". My suppositions are below.
My question is simply: Why does the indicated line compile and the equivalent following line fail to compile (and I don't want because "the standard says so", I already know that - I will not accept any answers that give this as an explanation; I'd like an explanation as to why the standard says so).
Notes: Although I use boost, boost is irrelevant to this question, and the error in various formats has been reproduced using g++ 4.1.* and VC7.1.
#include <boost/bind.hpp>
#include <iostream>
#include <map>
#include <algorithm>
class Base
{
protected:
void foo(int i)
{ std::cout << "Base: " << i << std::endl; }
};
struct Derived : public Base
{
Derived()
{
data[0] = 5;
data[1] = 6;
data[2] = 7;
}
void test()
{
// Compiles
std::for_each(data.begin(), data.end(),
boost::bind(&Derived::foo, this,
boost::bind(&std::map<int, int>::value_type::second, _1)));
// Fails to compile - why?
std::for_each(data.begin(), data.end(),
boost::bind(&Base::foo, this,
boost::bind(&std::map<int, int>::value_type::second, _1)));
}
std::map<int, int> data;
};
int main(int, const char**)
{
Derived().test();
return 0;
}
The indicated line fails with this error:
main.C: In member function 'void Derived::test()':
main.C:9: error: 'void Base::foo(int)' is protected
main.C:31: error: within this context
As noted, the supposedly equivalent statement above compiles cleanly (and if the offending statement is commented out, runs with the expected result of printing “5”, “6”, “7” on separate lines).
While searching for an explanation, I came across 11.5.1 in the standard (specifically, I’m looking at the 2006-11-06 draft):
An additional access check beyond
those described earlier in clause 11
is applied when a non-static data
member or nonstatic member function is
a protected member of its naming class
(11.2)105) As described earlier,
access to a protected member is
granted because the reference occurs
in a friend or member of some class C.
If the access is to form a pointer to
member (5.3.1), the
nested-name-specifier shall name C or
a class derived from C. All other
accesses involve a (possibly implicit)
object expression (5.2.5). In this
case, the class of the object
expression shall be C or a class
derived from C.
After reading this, it became evidently why the second statement failed while the first succeeded, but then the question came up: What is the rationale for this?
My initial thought was that the compiler was expanding the boost::bind templates, discovering that Base::foo was protected and kicking it out because boost::bind<…> was not a friend. But, the more I thought about this explanation, the less it made sense, because if I recall correctly, as soon as you take the pointer to a member (assuming you initially are within access control of the member), all access control information is lost (i.e. I could define a function that returns an arbitrary pointer to a member that alternately returns a public, protected or private member depending on some input and the returner would be none the wiser).
More I thought about it, and the only plausible explanation I could come up with why it should make a difference was in the case of multiple inheritance. Specifically, that depending on the class layout, the member pointer when calculated from Base would be different than that calculated from Derived.
It's all about "context". In the first call the context of the call is Derived which has access to the protected members of Base and hence is allowed to take addresses of them. In the second the context is "outside of" Derived and hence outside of Base so the protected member access is not allowed.
Actually, this seems logical. Inheritance gives you access to Derived::foo and not to Base::foo. Let me illustrate with a code example:
struct Derived : public Base
{
void callPrivateMethod(Base &b)
{
// this should obviously fail
b.foo(5);
// pointer-to-member call should also fail
void (Base::*pBaseFoo) (int) = &Base::foo; // the same error as yours here
(b.*pBaseFoo)(5);
}
};
The reason for this restriction is enforcement of access control across different classes that share a common base.
This is reinforced by notes in Core Language Defects Report defect #385, the relevant part copied here for reference:
[...] the reason we have this rule is that C's use of inherited protected members might be different from their use in a sibling class, say D. Thus members and friends of C can only use B::p in a manner consistent with C's usage, i.e., in C or derived-from-C objects.
As an example of something this rule prevents:
class B {
protected:
void p() { };
};
class C : public B {
public:
typedef void (B::*fn_t)();
fn_t get_p() {
return &B::p; // compilation error here, B::p is protected
}
};
class D : public B { };
int main() {
C c;
C::fn_t pbp = c.get_p();
B * pb = new D();
(pb->*pbp)();
}
The protected status of D::p is something we want the compiler to enforce, but if the above compiled that would not be the case.