Modification of base class needs subclases recompilation? - c++

suppose I have a base class and a derived class. I have compiled my program and running it. Now suppose I want to do some changes in my base class.
My questions are:
Question 1: If i separately do the changes in base class file only and recompile only base class then whether the changes will reflect in derived class objects also which are already instantiated or do i need to recompile derived class also. Other way of asking this question could be whether the copy is created to base class member functions or pointers are stored so that changes automatically gets reflected ??
Question 2: If not updated automatically, then is there any way to do this ??

C++ doesn't have reflection, so you need to recompile the whole thing.

This isn't well-defined by the language (since it doesn't address dynamic linking), but we can lay out some cases that may work, and some that almost certainly won't.
Should work:
changing a non-inlined function body in base.cpp
so long as cross-module/link-time inlining isn't enabled
assuming the derived class doesn't depends only on the interface and not on changed behaviour
adding static or non-virtual methods
beware of changing overload resolution though
Likely to fail horribly:
changing the prototype of any method or constructor used in the derived class
this includes changes that wouldn't normally be visible in calling code (such as adding defaulted arguments) or changing the type of an argument even where there is an implicit conversion, etc.
adding, removing or re-ordering virtual methods
adding, removing or re-ordering data members or base classes
There are several assumptions underlying these:
your base class and derived class are in separate dynamic libs (eg. base.so and derived.so). This isn't clear from your question
the only reasons for runtime compatibility to break are
because the layout in memory changed (ie, you modified the base-class instance size, or the vtable size, or the size or order of members)
because the code that would be generated at a call site changed (ie, because you added arguments with default values, or changed an argument to an implicitly-convertible type)
If the first assumption isn't true, the question seems pointless, since you'll have to rebuild the whole lib anyway.
The second assumption could be broken if you change or upgrade your compiler, or change the compiler flags, or the version of any other library you link against.
With respect to inlining, you can get horribly subtle bugs if different dynamic libs have inlined different versions of the code. You really won't enjoy trying to debug those.

C++ is a statically compiled language. That means that every type checking is done at compile time, so if you modify a type, every line of code that depends on the modification must be recompiled. It includes base class modification and subclases, as in your case.
Note that it could be a problem, because if you are writting an API, and you modify the API implementation, the API and every code that uses the code you have modified (The user code) must be recompiled.
The classic thechnique to reduce recompilation is the PIMPL idiom.
PIMPL hides the implementation of the class through a pointer to a implementation class stored as member of the original class. Note that the original class acts only as a interface. So if the implementation is modified, the interface not, so users of the class not need to recompile.

Related

Does 'final' specifier add any overhead?

Does using specifier final on a class or on a function add any memory or cpu overhead, or is it used at compile time only?
And how does std::is_final recognise what is final?
It actually can reduce overhead. And in rare cases, increase it.
If you have a pointer to a final class A, any virtual method calls can be de-virtualized and called directly. Similarly, a call to a virtual final method can be de-virtualized. In addition, the inheritance tree of a final class is fixed, even if it contains virtual parent classes, so you can de-virtualize some parent access.
Each of these de-virtualizations reduce or eliminate the requirement that a run-time structure (the vtable) be queried.
There can be a slight downside. Some coding techniques rely on vtable access to avoid direct access to a symbol, and then do not export the symbol. Accessing a vtable can be done via convention (without symbols from a library, just the header file for the classes in question), while accessing a method directly involves linking against that symbol.
This breaks one form of dynamic C++ library linking (where you avoid linking more than a dll loading symbol and/or C linkage functions that return pointers, and classes are exported via their vtables).
It is also possible that if you link against a symbol in a dynamic library, the dynamic library symbol load could be more expensive than the vtable lookup. I have not experienced or profiled this, but I have seen it claimed. The benefits should, in general, outweigh such costs. Any such cost is a quality of implementation issue, as the cost is not mandated to occur because the method is final.
Finally, final inhibits the empty base optimization trick on classes, where someone knows your class has no state, and inherits from it to reduce the overhead of "storing" an instance of your class from 1 byte to 0 bytes. If your class is empty and contains no virtual methods/inheritance, don't use final to avoid this being blocked. There is no equivalent for final functions.
Other than the EBO optimization issue (which only occurs with empty types), any overhead from final comes from how other code interacts with it, and will be rare. Far more often it will make other code faster, as directly interacting with a method enables both a more direct call of the method, and can lead to knock-on optimizations (because the call can be more fully understood by the compiler).
Marking anything except an empty type as final when it is final is almost certainly harmless at run time. Doing so on classes with virtual functions and inheritance is likely to be beneficial at run time.
std::is_final and similar traits are almost all implemented via compiler built-in magic. A good number of the traits in std require such magic. See How to detect if a class is final in C++11? (thanks to #Csq for finding that)
No, it's only used at compile-time
Magic (see here for further info - thanks Csq for link)

C++ -- When recompilation is required

You have a class that many libraries depend on. You need to modify the class for one application. Which of the following changes require recompiling all libraries before it is safe to build the application?
add a constructor
add a data member
change destructor into virtual
add an argument with default value to an existing member function
Classes are defined in the header file. The header file will be compiled into both the library that implements the class and the code that uses the class. I am assuming that you are taking as a given that you will need to recompile the class implementation after changing the class header file and that the question you are asking is whether you will need to recompile any code that references the class.
The problem that you are describing is one of binary compatibility (BC) and generally follows the following rules:
Adding non-virtual functions anywhere in the class does not break BC.
Changing any function definition (adding parameters )will break BC.
Adding virtual functions anywhere changes the v-table and therefore breaks BC.
Adding data members will break BC.
Changing a parameter from non-default to default will not break BC.
Any changes to inline functions will break BC (inline function should therefore be avoided if BC is important.)
Changing compiler (or sometimes even compiler versions) will probably break BC unless the compilers adhere strickly to the same ABI.
If BC is a major issue for the platform you are implementing it could well be a good idea to separate out the interface and implementation using the Bridge pattern.
As an aside, the C++ language does not deal with the Application Binary Interface (ABI). If binary compatibility is a major issue, you should probably refer to your platform's ABI specification for more details.
Edit: updated adding data members. This will break BC because more memory will now be needed for the class than before.
Strictly speaking, you end up in Undefined Behavior land as soon as you do not recompile for any of those reasons.
That said, in practice you might get away with a few of them:
add a constructor
Might be Ok to use as long as
it's not the first user-defined constructor to the class
it's not the copy constructor
add a data member
This changes the size of instances of the class. Might be Ok for anyone who just uses pointers or references, if you take care to put that data behind all other data, so that the offsets for accessing the other data members do not change. But the exact layout of sub objects in binary is not defined, so you will have to rely on a specific implementation.
change destructor into virtual
This changes the class' virtual table, so it needs recompilation.
add an argument with default value to an existing member function
Since default arguments are inserted at the call site, everyone using this needs to recompile. (However, using overloading instead of default arguments might allow you to get away with that.)
Note that any inlined member function could render any of the above wrong, since the code of those is directly embedded (and optimized) in the clients' code.
However, the safest bet would be to just recompile everything. Why is this an issue?
All of them need to recompile all the libraries that use the class. (provided they include the .h file)
sbi's answer is pretty good (and deserves to be voted up to top). However I think I can expand the "maybe ok" into something more concrete.
Add a constructor
If the constructor you've added is the default constructor (or indeed a copy constructor) then you have to be careful. If previously not available then they will have been automatically generated by the compiler (as such a recompilation is required to ensure they are using the actual constructor that has been implemented). For this reason I tend to always hide or define these constructors for classes that form some API.
By using ordinal export .def file to maintain Application Binary Interface, you can avoid client recompilation in many cases:
Add a constructor
Export this constructor function to
end of export table with largest
ordinal number. Any client code
doesn't call this constructor need
not compile.
Add a data member
This is a break if client code manipulates class object directly, not through pointer or reference.
Change destructor into virtual
This is probably a break, if your
class doesn't have any other virtual
function, which means now your class
has to add a vptr table and increase
class object size and change memory
layour. If your class has already
have a vptr table, moving destructor
to end of vptr table won't affect
object layout in terms of backward
compatibility. But if client class is derived from your class and has defined its own virtual function then it breaks. And also any client calling
original non-virtual destructor will break.
Add an argument with default value to
an existing member function
This is definitely a break.
I am clearly against #sbi answer: in general you do need to recompile. Only under much more strict circumstances than the ones he posted you may get away.
add a constructor
If the constructor added is either the default constructor or the copy constructor, any code that used the implicitly defined version of it and does not get recompiled will fail to initialize the object, and that means that invariants required by other methods will not be set at construction, i.e. the code will fail.
add a data member
This modifies the layout of the object. Even code that only used pointers or references need to be recompiled to adapt to the change in layout. If a member is added at the beginning of the object, any code that used any member of the object will be offset and fail.
struct test {
// int x; // added later
int y;
};
void foo( test * t ) {
std::cout << t->y << std::endl;
}
If foo was not recompiled, then after uncommenting x it would print t->x instead of t->y. If the types did not match it would even be worse. Theoretically, even if the added member is at the end of the object, if there are more than one access modifier the compiler is allowed to reorder members and hit the same issue.
change destructor to virtual
If it is the first virtual method it will change the layout of the object and get all of the previous issues plus the addition that deleting through a reference to the base will call the base destructor and not be dispatched to the correct method. In most compilers (with vtable support) it can imply a change in the memory layout of the vtable for the type, and that means that the wrong method can be called and cause havoc.
add an argument with default value
This is a change in function signature, all code that used the method before will need to be recompiled to adapt to the new signature.
As soon as you change anything in the header file (hpp file), you have to recompile everything that depends on it.
However if you change the source file (cpp file), you have to recompile just the library which contains needs definitions from this file.
The easy way to break physical dependencies, where all libraries in the upper tier needs to recompile is to use the pimpl idiom. Then, as long as you don't touch the header files, you just need to compile the library where the implementation is being modified.

Inheritance in C++ internals

Can some one explain me how inheritance is implemented in C++ ?
Does the base class gets actually copied to that location or just refers to that location ?
What happens if a function in base class is overridden in derived class ? Does it replace it with the new function or copies it in other location in derived class memory ?
first of all you need to understand that C++ is quite different to e.g. Java, because there is no notion of a "Class" retained at runtime. All OO-features are compiled down to things which could also be achieved by plain C or assembler.
Having said this, what acutally happens is that the compiler generates kind-of a struct, whenever you use your class definition. And when you invoke a "method" on your object, actually the compiler just encodes a call to a function which resides somewhere in the generated executable.
Now, if your class inherits from another class, the compiler somehow includes the fields of the baseclass in the struct he uses for the derived class. E.g. it could place these fields at the front and place the fields corresponding to the derived class after that. Please note: you must not make any assumptions regarding the concrete memory layout the C++ compiler uses. If you do so, you're basically on your own and loose any portability.
How is the inheritance implemented? well, it depends!
if you use a normal function, then the compiler will use the concrete type he's figured out and just encode a jump to the right function.
if you use a virtual function, the compiler will generate a vtable and generate code to look up a function pointer from that vtable, depending on the run time type of the object
This distinction is very important in practice. Note, it is not true that inheritance is allways implemented through a vtable in C++ (this is a common gotcha). Only if you mark a certain member function as virtual (or have done so for the same member function in a baseclass), then you'll get a call which is directed at runtime to the right function. Because of this, a virtual function call is much slower than a non-virtual call (might be several hundered times)
Inheritance in C++ is often accomplished via the vtable. The linked Wikipedia article is a good starting point for your questions. If I went into more detail in this answer, it would essentially be a regurgitation of it.

question about recompiling the library in C++

Suppose my class is depending on other library. Now I need to modify the class for one application. What kind of modification will force me to recompile
all libraries. What's the rule to recompile all libraries?
for example, I only know the case 2) is like this. What about the others?
1) add a constructor
2) add a data member
3) change destructor into virtual
4) add an argument with default value to an existing member function
Do you really mean that the class you are changing depends on the library? You never have to recompile a library because you've changed something that depends on the library. You recompile a library if you change something that the library depends on.
The answer is that in C++, technically all of those things require recompiling anything that uses the class. The one definition rule only permits classes to be defined in multiple translation units if the definitions are exactly the same in all units (I think "exactly" means the same sequence of tokens after preprocessing, in which case even changing the name of a parameter requires recompilation). So if different source files share a header, and a class definition in that header changes, C++ guarantees nothing about whether code compiled from those two source files remains compatible if only one of them is rebuilt.
However, your particular C++ implementation will use a static/dynamic library format which relaxes the rules, and allows some changes to be "binary compatible". Of the things you list, only (1) has much chance of being binary compatible. You'd have to check your documentation, but it's probably fine. In general (2) changes the size and layout of the objects, (3) changes the code required by the caller to destroy objects, and (4) changes the signature of the function (default values are inserted by the calling code, not the callee).
It's often worth avoiding default parameters pretty much for this reason. Just add another overload. So instead of changing:
void foo(int a);
to
void foo(int a, int b = 0);
replace it with:
void foo(int a) { foo(a, 0); }
void foo(int a, int b);
Of course the former change isn't even source-compatible if a user is taking a pointer to the function foo, let alone binary compatible. The latter is source-compatible provided that the ambiguity is resolved over which foo to use. C++ does make some effort to help with this, initializing a function pointer is a rare (only?) case where context affects the value of an expression.
You'll only need to recompile the code that depends on your class (like Nikolai said), i.e. you're class resides in a library that's used by others.
Even then you'll need to recompile the dependent code only if your class:
changes it's memory layout:
you add/remove members;
change members types;
you add a virtual method or change an existing one to be virtual (so you add a hidden vpointer member);
changes method signatures (more precisely changes in what the compiler uses for name decoration/mangling):
change their constness;
change their virtualness;
add/change default parameter values.
I'm pretty sure I've missed some things but I'll add any other things that come up (either from comments or if my memory starts working better).
If you're only using those libraries, nothing forces you to recompile them ...
except changing of compiler / architecture / OS or changing some #defines of the libraries
The question is a bit confusing, so to get this strait, you only need to recompile code that depends on your class.
Then for things that depend on your class:
Adding a constructor to your class introduces a new function, so if client code doesn't use that constructor - no recompilation is needed,
Adding data member changes memory layout of the class - recompilation is required,
Changing destructor to virtual changes/introduces vtable layout and function dispatch code - recompilation is required,
Adding argument with default value to existing member function changes number of arguments to that function (default arguments are substituted at the call site) - client code recompilation is required.
Changing any source code or classes that the library depends on should force a recompilation of the library. A good build tool, with the dependencies set up correctly, will handle this automagically during the build process.

Could C++ have not obviated the pimpl idiom?

As I understand, the pimpl idiom is exists only because C++ forces you to place all the private class members in the header. If the header were to contain only the public interface, theoretically, any change in class implementation would not have necessitated a recompile for the rest of the program.
What I want to know is why C++ is not designed to allow such a convenience. Why does it demand at all for the private parts of a class to be openly displayed in the header (no pun intended)?
This has to do with the size of the object. The h file is used, among other things, to determine the size of the object. If the private members are not given in it, then you would not know how large an object to new.
You can simulate, however, your desired behavior by the following:
class MyClass
{
public:
// public stuff
private:
#include "MyClassPrivate.h"
};
This does not enforce the behavior, but it gets the private stuff out of the .h file.
On the down side, this adds another file to maintain.
Also, in visual studio, the intellisense does not work for the private members - this could be a plus or a minus.
I think there is a confusion here. The problem is not about headers. Headers don't do anything (they are just ways to include common bits of source text among several source-code files).
The problem, as much as there is one, is that class declarations in C++ have to define everything, public and private, that an instance needs to have in order to work. (The same is true of Java, but the way reference to externally-compiled classes works makes the use of anything like shared headers unnecessary.)
It is in the nature of common Object-Oriented Technologies (not just the C++ one) that someone needs to know the concrete class that is used and how to use its constructor to deliver an implementation, even if you are using only the public parts. The device in (3, below) hides it. The practice in (1, below) separates the concerns, whether you do (3) or not.
Use abstract classes that define only the public parts, mainly methods, and let the implementation class inherit from that abstract class. So, using the usual convention for headers, there is an abstract.hpp that is shared around. There is also an implementation.hpp that declares the inherited class and that is only passed around to the modules that implement methods of the implementation. The implementation.hpp file will #include "abstract.hpp" for use in the class declaration it makes, so that there is a single maintenance point for the declaration of the abstracted interface.
Now, if you want to enforce hiding of the implementation class declaration, you need to have some way of requesting construction of a concrete instance without possessing the specific, complete class declaration: you can't use new and you can't use local instances. (You can delete though.) Introduction of helper functions (including methods on other classes that deliver references to class instances) is the substitute.
Along with or as part of the header file that is used as the shared definition for the abstract class/interface, include function signatures for external helper functions. These function should be implemented in modules that are part of the specific class implementations (so they see the full class declaration and can exercise the constructor). The signature of the helper function is probably much like that of the constructor, but it returns an instance reference as a result (This constructor proxy can return a NULL pointer and it can even throw exceptions if you like that sort of thing). The helper function constructs a particular implementation instance and returns it cast as a reference to an instance of the abstract class.
Mission accomplished.
Oh, and recompilation and relinking should work the way you want, avoiding recompilation of calling modules when only the implementation changes (since the calling module no longer does any storage allocations for the implementations).
You're all ignoring the point of the question -
Why must the developer type out the PIMPL code?
For me, the best answer I can come up with is that we don't have a good way to express C++ code that allows you to operate on it. For instance, compile-time (or pre-processor, or whatever) reflection or a code DOM.
C++ badly needs one or both of these to be available to a developer to do meta-programming.
Then you could write something like this in your public MyClass.h:
#pragma pimpl(MyClass_private.hpp)
And then write your own, really quite trivial wrapper generator.
Someone will have a much more verbose answer than I, but the quick response is two-fold: the compiler needs to know all the members of a struct to determine the storage space requirements, and the compiler needs to know the ordering of those members to generate offsets in a deterministic way.
The language is already fairly complicated; I think a mechanism to split the definitions of structured data across the code would be a bit of a calamity.
Typically, I've always seen policy classes used to define implementation behavior in a Pimpl-manner. I think there are some added benefits of using a policy pattern -- easier to interchange implementations, can easily combine multiple partial implementations into a single unit which allow you to break up the implementation code into functional, reusable units, etc.
May be because the size of the class is required when passing its instance by values, aggregating it in other classes, etc ?
If C++ did not support value semantics, it would have been fine, but it does.
Yes, but...
You need to read Stroustrup's "Design and Evolution of C++" book. It would have inhibited the uptake of C++.