C++ -- When recompilation is required - c++

You have a class that many libraries depend on. You need to modify the class for one application. Which of the following changes require recompiling all libraries before it is safe to build the application?
add a constructor
add a data member
change destructor into virtual
add an argument with default value to an existing member function

Classes are defined in the header file. The header file will be compiled into both the library that implements the class and the code that uses the class. I am assuming that you are taking as a given that you will need to recompile the class implementation after changing the class header file and that the question you are asking is whether you will need to recompile any code that references the class.
The problem that you are describing is one of binary compatibility (BC) and generally follows the following rules:
Adding non-virtual functions anywhere in the class does not break BC.
Changing any function definition (adding parameters )will break BC.
Adding virtual functions anywhere changes the v-table and therefore breaks BC.
Adding data members will break BC.
Changing a parameter from non-default to default will not break BC.
Any changes to inline functions will break BC (inline function should therefore be avoided if BC is important.)
Changing compiler (or sometimes even compiler versions) will probably break BC unless the compilers adhere strickly to the same ABI.
If BC is a major issue for the platform you are implementing it could well be a good idea to separate out the interface and implementation using the Bridge pattern.
As an aside, the C++ language does not deal with the Application Binary Interface (ABI). If binary compatibility is a major issue, you should probably refer to your platform's ABI specification for more details.
Edit: updated adding data members. This will break BC because more memory will now be needed for the class than before.

Strictly speaking, you end up in Undefined Behavior land as soon as you do not recompile for any of those reasons.
That said, in practice you might get away with a few of them:
add a constructor
Might be Ok to use as long as
it's not the first user-defined constructor to the class
it's not the copy constructor
add a data member
This changes the size of instances of the class. Might be Ok for anyone who just uses pointers or references, if you take care to put that data behind all other data, so that the offsets for accessing the other data members do not change. But the exact layout of sub objects in binary is not defined, so you will have to rely on a specific implementation.
change destructor into virtual
This changes the class' virtual table, so it needs recompilation.
add an argument with default value to an existing member function
Since default arguments are inserted at the call site, everyone using this needs to recompile. (However, using overloading instead of default arguments might allow you to get away with that.)
Note that any inlined member function could render any of the above wrong, since the code of those is directly embedded (and optimized) in the clients' code.
However, the safest bet would be to just recompile everything. Why is this an issue?

All of them need to recompile all the libraries that use the class. (provided they include the .h file)

sbi's answer is pretty good (and deserves to be voted up to top). However I think I can expand the "maybe ok" into something more concrete.
Add a constructor
If the constructor you've added is the default constructor (or indeed a copy constructor) then you have to be careful. If previously not available then they will have been automatically generated by the compiler (as such a recompilation is required to ensure they are using the actual constructor that has been implemented). For this reason I tend to always hide or define these constructors for classes that form some API.

By using ordinal export .def file to maintain Application Binary Interface, you can avoid client recompilation in many cases:
Add a constructor
Export this constructor function to
end of export table with largest
ordinal number. Any client code
doesn't call this constructor need
not compile.
Add a data member
This is a break if client code manipulates class object directly, not through pointer or reference.
Change destructor into virtual
This is probably a break, if your
class doesn't have any other virtual
function, which means now your class
has to add a vptr table and increase
class object size and change memory
layour. If your class has already
have a vptr table, moving destructor
to end of vptr table won't affect
object layout in terms of backward
compatibility. But if client class is derived from your class and has defined its own virtual function then it breaks. And also any client calling
original non-virtual destructor will break.
Add an argument with default value to
an existing member function
This is definitely a break.

I am clearly against #sbi answer: in general you do need to recompile. Only under much more strict circumstances than the ones he posted you may get away.
add a constructor
If the constructor added is either the default constructor or the copy constructor, any code that used the implicitly defined version of it and does not get recompiled will fail to initialize the object, and that means that invariants required by other methods will not be set at construction, i.e. the code will fail.
add a data member
This modifies the layout of the object. Even code that only used pointers or references need to be recompiled to adapt to the change in layout. If a member is added at the beginning of the object, any code that used any member of the object will be offset and fail.
struct test {
// int x; // added later
int y;
};
void foo( test * t ) {
std::cout << t->y << std::endl;
}
If foo was not recompiled, then after uncommenting x it would print t->x instead of t->y. If the types did not match it would even be worse. Theoretically, even if the added member is at the end of the object, if there are more than one access modifier the compiler is allowed to reorder members and hit the same issue.
change destructor to virtual
If it is the first virtual method it will change the layout of the object and get all of the previous issues plus the addition that deleting through a reference to the base will call the base destructor and not be dispatched to the correct method. In most compilers (with vtable support) it can imply a change in the memory layout of the vtable for the type, and that means that the wrong method can be called and cause havoc.
add an argument with default value
This is a change in function signature, all code that used the method before will need to be recompiled to adapt to the new signature.

As soon as you change anything in the header file (hpp file), you have to recompile everything that depends on it.
However if you change the source file (cpp file), you have to recompile just the library which contains needs definitions from this file.
The easy way to break physical dependencies, where all libraries in the upper tier needs to recompile is to use the pimpl idiom. Then, as long as you don't touch the header files, you just need to compile the library where the implementation is being modified.

Related

Can I use a slim version of my header to be included with the library?

What I mean is my real header file can look like this:
#include "some_internal_class.h"
class MyLibrary {
Type private_member;
void private_function();
public:
MyLibrary();
void function_to_be_called_by_library_users();
};
Now I want to produce a dynamic library containing all the necessary definitions. and I want to ship with it a single header instead of shipping every single header I have in my library.
So I was thinking I could create a slim version of my header like so:
class MyLibrary {
public:
MyLibrary();
void function_to_be_called_by_library_users();
};
Headers are just declarations anyway right? they're never passed to the compiler. And I've declared what the user will be using.
Is that possible? If not, why not?
This is a One Definition Rule violation. The moment you deviate by a single token.
[basic.def.odr]/6
There can be more than one definition of a class type, [...] in a
program provided that each definition appears in a different
translation unit, and provided the definitions satisfy the following
requirements. Given such an entity named D defined in more than one
translation unit, then
each definition of D shall consist of the same sequence of tokens; and
Your program may easily break if you violate the ODR like that. And your build system isn't at all obligated to even warn you about it.
You cannot define a class twice. It breaks the One Definition Rule (ODT). MyLibrary does that, unfortunately.
they're never passed to the compiler
They will. Members of a class must be known at compile time, so that the compiler can determine the class's size.
Header are just declarations anyway right? they're never passed to the
compiler. And I've declared what the user will be using.
No. Headers are part of source code and are compiled together with source files. They contain the information necessary for a compiler to understand how to work with code (in your case, with class MyLibrary).
As an example, you want library users to be able to create objects of class MyLibrary, so you export the constructor. However, this is not sufficient: the compiler needs to know the size of the object to be created, which is impossible unless you specify all the fields.
In practice, deciding what to expose to library users and what to hide as implementation details is a hard question, which requires detailed inspection of the library usage and semantics. If you really want to hide the class internals as implementation detail, here are some common options:
The pimpl idiom is a common solution. It enables you to work with the class as it is usually done, but the implementation details are nicely hidden.
Extract the interface into an abstract class with virtual functions, and use pointers (preferably smart pointers) to work with the objects.
Headers are just declarations anyway right? they're never passed to the compiler.
The moment you do a #include to a file, its content are copied and pasted into your source file exactly as they are.
So even though you don't pass them directly as compiler arguments, they're still part of your code and code in them will be compiled into your translation units.
Solutions by #lisyarus are pretty good.
But another option would be doing it the C way. Which is the most elegant in my opinion.
In C you give your users a handle, which will most likely be a pointer.
Your header would look something like this:
struct MyLibrary;
MyLibrary*
my_library_init();
void
my_library_destroy(MyLibrary*);
void
my_library_function_to_be_called_by_library_users(MyLibrary*);
A very small and simple interface that does not show your users anything you don't want them to see.
Another nice perk is that your build system will not have to recompile your whole program just because you added a field to the MyLibrary struct.
You have to watch out though, because now you have to call my_library_destroy which will carry the logic of your destructor.

Does 'final' specifier add any overhead?

Does using specifier final on a class or on a function add any memory or cpu overhead, or is it used at compile time only?
And how does std::is_final recognise what is final?
It actually can reduce overhead. And in rare cases, increase it.
If you have a pointer to a final class A, any virtual method calls can be de-virtualized and called directly. Similarly, a call to a virtual final method can be de-virtualized. In addition, the inheritance tree of a final class is fixed, even if it contains virtual parent classes, so you can de-virtualize some parent access.
Each of these de-virtualizations reduce or eliminate the requirement that a run-time structure (the vtable) be queried.
There can be a slight downside. Some coding techniques rely on vtable access to avoid direct access to a symbol, and then do not export the symbol. Accessing a vtable can be done via convention (without symbols from a library, just the header file for the classes in question), while accessing a method directly involves linking against that symbol.
This breaks one form of dynamic C++ library linking (where you avoid linking more than a dll loading symbol and/or C linkage functions that return pointers, and classes are exported via their vtables).
It is also possible that if you link against a symbol in a dynamic library, the dynamic library symbol load could be more expensive than the vtable lookup. I have not experienced or profiled this, but I have seen it claimed. The benefits should, in general, outweigh such costs. Any such cost is a quality of implementation issue, as the cost is not mandated to occur because the method is final.
Finally, final inhibits the empty base optimization trick on classes, where someone knows your class has no state, and inherits from it to reduce the overhead of "storing" an instance of your class from 1 byte to 0 bytes. If your class is empty and contains no virtual methods/inheritance, don't use final to avoid this being blocked. There is no equivalent for final functions.
Other than the EBO optimization issue (which only occurs with empty types), any overhead from final comes from how other code interacts with it, and will be rare. Far more often it will make other code faster, as directly interacting with a method enables both a more direct call of the method, and can lead to knock-on optimizations (because the call can be more fully understood by the compiler).
Marking anything except an empty type as final when it is final is almost certainly harmless at run time. Doing so on classes with virtual functions and inheritance is likely to be beneficial at run time.
std::is_final and similar traits are almost all implemented via compiler built-in magic. A good number of the traits in std require such magic. See How to detect if a class is final in C++11? (thanks to #Csq for finding that)
No, it's only used at compile-time
Magic (see here for further info - thanks Csq for link)

Modification of base class needs subclases recompilation?

suppose I have a base class and a derived class. I have compiled my program and running it. Now suppose I want to do some changes in my base class.
My questions are:
Question 1: If i separately do the changes in base class file only and recompile only base class then whether the changes will reflect in derived class objects also which are already instantiated or do i need to recompile derived class also. Other way of asking this question could be whether the copy is created to base class member functions or pointers are stored so that changes automatically gets reflected ??
Question 2: If not updated automatically, then is there any way to do this ??
C++ doesn't have reflection, so you need to recompile the whole thing.
This isn't well-defined by the language (since it doesn't address dynamic linking), but we can lay out some cases that may work, and some that almost certainly won't.
Should work:
changing a non-inlined function body in base.cpp
so long as cross-module/link-time inlining isn't enabled
assuming the derived class doesn't depends only on the interface and not on changed behaviour
adding static or non-virtual methods
beware of changing overload resolution though
Likely to fail horribly:
changing the prototype of any method or constructor used in the derived class
this includes changes that wouldn't normally be visible in calling code (such as adding defaulted arguments) or changing the type of an argument even where there is an implicit conversion, etc.
adding, removing or re-ordering virtual methods
adding, removing or re-ordering data members or base classes
There are several assumptions underlying these:
your base class and derived class are in separate dynamic libs (eg. base.so and derived.so). This isn't clear from your question
the only reasons for runtime compatibility to break are
because the layout in memory changed (ie, you modified the base-class instance size, or the vtable size, or the size or order of members)
because the code that would be generated at a call site changed (ie, because you added arguments with default values, or changed an argument to an implicitly-convertible type)
If the first assumption isn't true, the question seems pointless, since you'll have to rebuild the whole lib anyway.
The second assumption could be broken if you change or upgrade your compiler, or change the compiler flags, or the version of any other library you link against.
With respect to inlining, you can get horribly subtle bugs if different dynamic libs have inlined different versions of the code. You really won't enjoy trying to debug those.
C++ is a statically compiled language. That means that every type checking is done at compile time, so if you modify a type, every line of code that depends on the modification must be recompiled. It includes base class modification and subclases, as in your case.
Note that it could be a problem, because if you are writting an API, and you modify the API implementation, the API and every code that uses the code you have modified (The user code) must be recompiled.
The classic thechnique to reduce recompilation is the PIMPL idiom.
PIMPL hides the implementation of the class through a pointer to a implementation class stored as member of the original class. Note that the original class acts only as a interface. So if the implementation is modified, the interface not, so users of the class not need to recompile.

Inheritance in C++ internals

Can some one explain me how inheritance is implemented in C++ ?
Does the base class gets actually copied to that location or just refers to that location ?
What happens if a function in base class is overridden in derived class ? Does it replace it with the new function or copies it in other location in derived class memory ?
first of all you need to understand that C++ is quite different to e.g. Java, because there is no notion of a "Class" retained at runtime. All OO-features are compiled down to things which could also be achieved by plain C or assembler.
Having said this, what acutally happens is that the compiler generates kind-of a struct, whenever you use your class definition. And when you invoke a "method" on your object, actually the compiler just encodes a call to a function which resides somewhere in the generated executable.
Now, if your class inherits from another class, the compiler somehow includes the fields of the baseclass in the struct he uses for the derived class. E.g. it could place these fields at the front and place the fields corresponding to the derived class after that. Please note: you must not make any assumptions regarding the concrete memory layout the C++ compiler uses. If you do so, you're basically on your own and loose any portability.
How is the inheritance implemented? well, it depends!
if you use a normal function, then the compiler will use the concrete type he's figured out and just encode a jump to the right function.
if you use a virtual function, the compiler will generate a vtable and generate code to look up a function pointer from that vtable, depending on the run time type of the object
This distinction is very important in practice. Note, it is not true that inheritance is allways implemented through a vtable in C++ (this is a common gotcha). Only if you mark a certain member function as virtual (or have done so for the same member function in a baseclass), then you'll get a call which is directed at runtime to the right function. Because of this, a virtual function call is much slower than a non-virtual call (might be several hundered times)
Inheritance in C++ is often accomplished via the vtable. The linked Wikipedia article is a good starting point for your questions. If I went into more detail in this answer, it would essentially be a regurgitation of it.

question about recompiling the library in C++

Suppose my class is depending on other library. Now I need to modify the class for one application. What kind of modification will force me to recompile
all libraries. What's the rule to recompile all libraries?
for example, I only know the case 2) is like this. What about the others?
1) add a constructor
2) add a data member
3) change destructor into virtual
4) add an argument with default value to an existing member function
Do you really mean that the class you are changing depends on the library? You never have to recompile a library because you've changed something that depends on the library. You recompile a library if you change something that the library depends on.
The answer is that in C++, technically all of those things require recompiling anything that uses the class. The one definition rule only permits classes to be defined in multiple translation units if the definitions are exactly the same in all units (I think "exactly" means the same sequence of tokens after preprocessing, in which case even changing the name of a parameter requires recompilation). So if different source files share a header, and a class definition in that header changes, C++ guarantees nothing about whether code compiled from those two source files remains compatible if only one of them is rebuilt.
However, your particular C++ implementation will use a static/dynamic library format which relaxes the rules, and allows some changes to be "binary compatible". Of the things you list, only (1) has much chance of being binary compatible. You'd have to check your documentation, but it's probably fine. In general (2) changes the size and layout of the objects, (3) changes the code required by the caller to destroy objects, and (4) changes the signature of the function (default values are inserted by the calling code, not the callee).
It's often worth avoiding default parameters pretty much for this reason. Just add another overload. So instead of changing:
void foo(int a);
to
void foo(int a, int b = 0);
replace it with:
void foo(int a) { foo(a, 0); }
void foo(int a, int b);
Of course the former change isn't even source-compatible if a user is taking a pointer to the function foo, let alone binary compatible. The latter is source-compatible provided that the ambiguity is resolved over which foo to use. C++ does make some effort to help with this, initializing a function pointer is a rare (only?) case where context affects the value of an expression.
You'll only need to recompile the code that depends on your class (like Nikolai said), i.e. you're class resides in a library that's used by others.
Even then you'll need to recompile the dependent code only if your class:
changes it's memory layout:
you add/remove members;
change members types;
you add a virtual method or change an existing one to be virtual (so you add a hidden vpointer member);
changes method signatures (more precisely changes in what the compiler uses for name decoration/mangling):
change their constness;
change their virtualness;
add/change default parameter values.
I'm pretty sure I've missed some things but I'll add any other things that come up (either from comments or if my memory starts working better).
If you're only using those libraries, nothing forces you to recompile them ...
except changing of compiler / architecture / OS or changing some #defines of the libraries
The question is a bit confusing, so to get this strait, you only need to recompile code that depends on your class.
Then for things that depend on your class:
Adding a constructor to your class introduces a new function, so if client code doesn't use that constructor - no recompilation is needed,
Adding data member changes memory layout of the class - recompilation is required,
Changing destructor to virtual changes/introduces vtable layout and function dispatch code - recompilation is required,
Adding argument with default value to existing member function changes number of arguments to that function (default arguments are substituted at the call site) - client code recompilation is required.
Changing any source code or classes that the library depends on should force a recompilation of the library. A good build tool, with the dependencies set up correctly, will handle this automagically during the build process.