Does the size of a binary executable depend on inheritance? - c++

I have two scenarios, one with inheritance, and one without.
First:
Class A
{
public:
int a;
void prnt() { cout << "class A"; }
}
Class B: public A
{
}
Second:
Class A
{
public:
int a;
void prnt() { cout << "class B"; }
}
Class B
{
public:
int a;
void prnt() { cout << "class A"; }
}
Does the inheritance increase the size of executable or not?
As in above example, in the first scenario I derived class B from A, so that B has all the data members of B. And in the second scenario I just copy pasted the code from class A to B.
What is advisable in the above scenarios, where we know that there will be only two classes with same data members?
Should we use inheritance, or just create two separate classes?
And will there be any difference in the size of final executable in both scenarios?

You shouldn't worry to much about these kind of details and fully concentrate on the design that can best represent your application domain.
The real question you should ask, is whether B is really an A, i.e. everytime you use an A you could also use a B.
Nevertheless, for the sake of curiosity:
I've compiled both versions with MSVC2013 by adding a main() creating an A and a B object and invoking their sole function (I used exacltly the same prnt body in the second example to compare comparable things). In both debug and release mode, the exe file were in the end EXACTLY the same size.
As pointed out in the comments, compiler do a pretty good job. For instance, in release mode, it inlined the code of the function invokation, resulting in the same main() machine code. For the second version, additional code was generated for B::prnt(), but apparently the linker identified the unused function and discarded it.
Of course, with more complex code, the result might be different. Inlining will not always be possible, and functions can have small differences which make deduplication more difficult.
As a rule of the tumb, the size of the code will depend more on the quantity of code (and copy/pasted code) that you write than on inheritance. Inheritance should in fact tend naturally to generate smaller code (especially true with older less optimized compilers), because the inheritance relation tells the compiler that it can/shall reuse the common code.

Related

C++ Relaying Member Functions in Nested Classes?

I work at a manufacturing plant that uses a large C++ project to automate the manufacturing process.
I see a certain practice all over the place that just seems to make code unnecessarily long and I was wondering if there is a specific reason this practice is used.
See below for a simplified example that demonstrates this practice.
First file:
class A {
private:
int a;
public:
int get_a()
{ return a; }
void set_a(int arg)
{ a = arg; }
};
Second file:
class B {
private:
int b;
public:
int get_b()
{ return b; }
void set_b(int arg)
{ b = arg; }
};
Third file:
class C {
private:
A obj1;
B obj2;
public:
int get_a()
{ return obj1.get_a(); }
int get_b()
{ return obj2.get_b(); }
void set_a(int arg)
{ obj1.set_a(arg); }
void set_b(int arg)
{ obj2.set_b(arg); }
};
It seems to me like a slight change in design philosophy could have drastically reduced the amount of code in the third file. Something like this:
class C {
public:
A obj1;
B obj2;
};
Having obj1 and obj2 be public members in the C class does not seem to be unsafe, because the A and B classes each safely handle the getting and setting of their own member variables.
The only disadvantage I can think of to doing it this way is that any instances of the C class that calls a function would need to do something like obj.obj1.get_a() instead of just obj.get_a() but this seems like much less of an inconvenience than having private A and B object instances in the C class and then manually needing to "relay" all of their member functions.
I realize for this simple example, it is not much extra code, but for this large project that my company uses, it adds literally tens of thousands of lines of code.
Am I missing something?
There can be many reasons. One is the following:
Imagine you write a function that does something with the member a. You want the same code to accept an A as well as a C. Your function could look like this:
template <typename T>
void foo(T& t) {
std::cout << " a = " << t.get_a();
}
This would not work with your C because it has a different interface.
Encapsulation has its benefits, but I agree with you that encapsulation for the sake of encapsulation very often leads to much more code and often to nothing else than that.
In general, forcing calling code to write something like obj.obj1.get_a() is discouraged, because it reveals implementations details. If you ever change e.g the type of a then your C has no control whatsoever on that change. On the other hand if in the origninal code a changes from int to double then C can decide whether to keep the int interface and do some conversion (if applicable) or to change its interface.
It does add a little extra code, but the important thing is your interface. A class has a responsibility, and the members it holds are implementation details. If you expose internal objects and force users to "get the object, then make calls on it" you are coupling the caller to the implementation more than if you just provide an interface that does the job for the user. As an analogy, [borrowed from wikipedia] when one wants a dog to walk, one does not command the dog's legs to walk directly; instead one commands the dog which then commands its own legs.
Law of Demeter / Wikipedia
More formally, the Law of Demeter for functions requires that a method m of an object O may only invoke the methods of the following kinds of objects:
O itself
m's parameters
Any objects created/instantiated within m
O's direct component objects
A global variable, accessible by O, in the scope of m
In particular, an object should avoid invoking methods of a member object returned by another method. For many modern object oriented languages that use a dot as field identifier, the law can be stated simply as "use only one dot". That is, the code a.b.c.Method() breaks the law where a.b.Method() does not.

Public virtual method overridden as private. Generalization/specialization/Liskov principles violation?

As in Private function member called outside of class, one can write the following code:
#include <iostream>
class A {
public:
virtual void f() { std::cout << "A::f()"; }
};
class B : public A {
private:
void f() override { std::cout << "B::f()"; }
};
void g(A &g) { g.f(); }
int main() {
A a;
g(a);
a.f();
B b;
g(b);
b.f(); // compilation failure
}
Of course, the compiler refuses to compile the last line, because the static analysis of the code reveals that B::f() is defined but private.
What seriously troubles me is the relation with the conceptual generalization/specialization. It is generally considered that you must be able to manipulate an instance of a subtype at least the same way you manipulate an instance of the super type.
This is the basis of the Liskov's substitution principle. In the given example, that is respected when g() is called either with an argument of type A or one of type B. But the last line is not accepted, and it seems that in such a case the substitution principle is violated in some way (consider a call-by-name as in a macro-definition #define h(x) (x.f())).
Even if one may consider that the Liskov's principle is not violated (macros are not real part of the language, so ok), the fact that the last line gives a compile-time error, at least means that objects of type B can't be manipulated as A's can be. So that B is not a specialization of A even if the derivation is public.
Thus in C++, using a public derivation does not guarantee that you are effectively implementing a specialization. You need to consider more properties of the code to be sure that you have a ‘‘correct’’ specialization.
Am I wrong ? Is someone able to give me a justification for it ? I would like a good semantic argument, I mean at least something much more elaborate than Stroustrup's ones like ‘‘C++ tries not to constrains you, you are free to use it if you want and not if you don't want’’. I think that a language need to be founded on a reasonable model, not on a huge list of possible tricks.

Position of a vpointer in an object

class C
{
public:
C() : m_x(0) { }
virtual ~C() { }
public:
static ptrdiff_t member_offset(const C &c)
{
const char *p = reinterpret_cast<const char*>(&c);
const char *q = reinterpret_cast<const char*>(&c.m_x);
return q - p;
}
private:
int m_x;
};
int main(void)
{
C c;
std::cout << ((C::member_offset(c) == 0) ? 0 : 1);
std::cout << std::endl;
std::system("pause");
return 0;
}
The program above outputs 1. What it does is just check the addresses of the c object and the c's field m_x. It prints out 1 which means the addresses are not equal. My guess is that is because the d'tor is virtual so the compiler has to create a vtable for the class and put a vpointer in the class's object. If I'm already wrong please correct me.
Apparently, it puts the vpointer at the beginning of the object, pushing the m_x field farther and thus giving it a different address. Is that the case? If so does the standard specify vpointer's position in the object? According to wiki it's implementation-dependent. And its position may change the output of the program.
So can you always predict the output of this program without specifying the target platform?
In reality, it is NEARLY ALWAYS laid out in this way. However, the C++ standard allows whatever works to be used. And I can imagine several solutions that doesn't REQUIRE the above to be true - although they would perhaps not work well as a real solution.
Note however that you can have more than one vptr/vtable for an object if you have multiple inheritance.
There are no "vpointers" in C++. The implementation of polymorphism and dynamic dispatch is left to the compiler, and the resulting class layout is not in any way specified. Certainly an object of polymorphic type will have to carry some extra state in order to identify the concrete type when given only a view of a base subobject.
Implementations with vtables and vptrs are common and popular, and putting the vptr at the beginning of the class means that you don't need any pointer adjustments for single inheritance up and downcasts.
Many C++ compilers follow (parts of) the Itanium ABI for C++, which specifies class layout decisions like this. This popular article may also provide some insights.
Yes, it is implementation dependent and no, you can't predict the program's output without knowing the target platform/compiler.

C++ struct reinterpret_cast

Suppose there are two struct A and B. They have a common struct C.
I would like to know if it is safe to call reinterpret_cast to A or B to C.
if not, is there any way to do so without any performance impact?
struct C
{
string m_c1;
int32_t m_c2;
double m_c3;
string m_c4;
};
struct A
{
C m_a1;
string m_a2;
int32_t m_a3;
};
struct B
{
C m_b1;
string m_b2;
int32_t m_b3;
double m_b4;
};
int main(int argc,char *argv[])
{
A a;
a.m_a1.m_c1="A";
a.m_a1.m_c4="AA";
B b;
b.m_b1.m_c1="B";
b.m_b1.m_c4="BB";
C* pc = reinterpret_cast<C*>(&a);
cout << pc->m_c1 << " " << pc->m_c4 << endl;
pc = reinterpret_cast<C*>(&b);
cout << pc->m_c1 << " " << pc->m_c4 << endl;
return 1;
}
As Mike DeSimone points out the string class is not guaranteed to be a standard-layout class, and thus the class C is not standard-layout, which means that you have no guarantees of the memory layout at all. So it is not safe. Only if you change the string to a (const) char* it will be guaranteed to be safe.
Even then it will only be safe as long as the layout of the classes stays the same (you cannot change the order of members or change the access specifier on them), and the classes stays without any vtables, this is "safe" in such a way that the compiler will generate code that display the behavior you would like.
This is how ever two guarantees that a software developer seldom is able to give. Also the code is hard to understand written like this. Another developer (or the same developer a month later) might ignore the this code (or simply don't understand it), and do the changes needed, and suddenly the code is broken and you got some hard to catch errors on your hand.
A and B are classes that give access to a C (or some members of C). More readable and thus more safe solutions are:
Create an accessor for both A and B, this would probably be inlined and incur no performance penalty.
If there is any reason for inheritance use simple inheritance. Either A and B is-a ClassThatHasAC or A and B is-a C As long as there are no virtual functions you would probably not see any performance issues here either. In both cases an accessor would provide you benefits probably without any performance cost.
Create some simple and readable code at first, and measure performance. Onli if this C access is costing you too much, optimize. But if your optimization boils down to the reinterpret cast trick make sure that there are plenty of warning signs around to make sure that no one steps on this booby trap.
Why dont you inherit both A,B from C and then use static_cast instead? Should be safer/cleaner.
Indeed in your case, you shouldnt need a cast at all, you should be able to assign A or B ptrs to a C*

LTO, Devirtualization, and Virtual Tables

Comparing virtual functions in C++ and virtual tables in C, do compilers in general (and for sufficiently large projects) do as good a job at devirtualization?
Naively, it seems like virtual functions in C++ have slightly more semantics, thus may be easier to devirtualize.
Update: Mooing Duck mentioned inlining devirtualized functions. A quick check shows missed optimizations with virtual tables:
struct vtab {
int (*f)();
};
struct obj {
struct vtab *vtab;
int data;
};
int f()
{
return 5;
}
int main()
{
struct vtab vtab = {f};
struct obj obj = {&vtab, 10};
printf("%d\n", obj.vtab->f());
}
My GCC will not inline f, although it is called directly, i.e., devirtualized. The equivalent in C++,
class A
{
public:
virtual int f() = 0;
};
class B
{
public:
int f() {return 5;}
};
int main()
{
B b;
printf("%d\n", b.f());
}
does even inline f. So there's a first difference between C and C++, although I don't think that the added semantics in the C++ version are relevant in this case.
Update 2: In order to devirtualize in C, the compiler has to prove that the function pointer in the virtual table has a certain value. In order to devirtualize in C++, the compiler has to prove that the object is an instance of a particular class. It would seem that the proof is harder in the first case. However, virtual tables are typically modified in only very few places, and most importantly: just because it looks harder, doesn't mean that compilers aren't as good in it (for otherwise you might argue that xoring is generally faster than adding two integers).
The difference is that in C++, the compiler can guarantee that the virtual table address never changes. In C then it's just another pointer and you could wreak any kind of havoc with it.
However, virtual tables are typically modified in only very few places
The compiler doesn't know that in C. In C++, it can assume that it never changes.
I tried to summarize in http://hubicka.blogspot.ca/2014/01/devirtualization-in-c-part-2-low-level.html why generic optimizations have hard time to devirtualize. Your testcase gets inlined for me with GCC 4.8.1, but in slightly less trivial testcase where you pass pointer to your "object" out of main it will not.
The reason is that to prove that the virtual table pointer in obj and the virtual table itself did not change the alias analysis module has to track all possible places you can point to it. In a non-trivial code where you pass things outside of the current compilation unit this is often a lost game.
C++ gives you more information on when type of object may change and when it is known. GCC makes use of it and it will make a lot more use of it in the next release. (I will write on that soon, too).
Yes, if it is possible for the compiler to deduce the exact type of a virtualized type, it can "devirtualize" (or even inline!) the call. A compiler can only do this if it can guarantee that no matter what, this is the function needed.
The major concern is basically threading. In the C++ example, the guarantees hold even in a threaded environment. In C, that can't be guaranteed, because the object could be grabbed by another thread/process, and overwritten (deliberately or otherwise), so the function is never "devirtualized" or called directly. In C the lookup will always be there.
struct A {
virtual void func() {std::cout << "A";};
}
struct B : A {
virtual void func() {std::cout << "B";}
}
int main() {
B b;
b.func(); //this will inline in optimized builds.
}
It depends on what you are comparing compiler inlining to. Compared to link time or profile guided or just in time optimizations, compilers have less information to use. With less information, the compile time optimizations will be more conservative (and do less inlining overall).
A compiler will still generally be pretty decent at inlining virtual functions as it is equivalent to inlining function pointer calls (say, when you pass a free function to an STL algorithm function like sort or for_each).