question about recompiling the library in C++ - c++

Suppose my class is depending on other library. Now I need to modify the class for one application. What kind of modification will force me to recompile
all libraries. What's the rule to recompile all libraries?
for example, I only know the case 2) is like this. What about the others?
1) add a constructor
2) add a data member
3) change destructor into virtual
4) add an argument with default value to an existing member function

Do you really mean that the class you are changing depends on the library? You never have to recompile a library because you've changed something that depends on the library. You recompile a library if you change something that the library depends on.
The answer is that in C++, technically all of those things require recompiling anything that uses the class. The one definition rule only permits classes to be defined in multiple translation units if the definitions are exactly the same in all units (I think "exactly" means the same sequence of tokens after preprocessing, in which case even changing the name of a parameter requires recompilation). So if different source files share a header, and a class definition in that header changes, C++ guarantees nothing about whether code compiled from those two source files remains compatible if only one of them is rebuilt.
However, your particular C++ implementation will use a static/dynamic library format which relaxes the rules, and allows some changes to be "binary compatible". Of the things you list, only (1) has much chance of being binary compatible. You'd have to check your documentation, but it's probably fine. In general (2) changes the size and layout of the objects, (3) changes the code required by the caller to destroy objects, and (4) changes the signature of the function (default values are inserted by the calling code, not the callee).
It's often worth avoiding default parameters pretty much for this reason. Just add another overload. So instead of changing:
void foo(int a);
to
void foo(int a, int b = 0);
replace it with:
void foo(int a) { foo(a, 0); }
void foo(int a, int b);
Of course the former change isn't even source-compatible if a user is taking a pointer to the function foo, let alone binary compatible. The latter is source-compatible provided that the ambiguity is resolved over which foo to use. C++ does make some effort to help with this, initializing a function pointer is a rare (only?) case where context affects the value of an expression.

You'll only need to recompile the code that depends on your class (like Nikolai said), i.e. you're class resides in a library that's used by others.
Even then you'll need to recompile the dependent code only if your class:
changes it's memory layout:
you add/remove members;
change members types;
you add a virtual method or change an existing one to be virtual (so you add a hidden vpointer member);
changes method signatures (more precisely changes in what the compiler uses for name decoration/mangling):
change their constness;
change their virtualness;
add/change default parameter values.
I'm pretty sure I've missed some things but I'll add any other things that come up (either from comments or if my memory starts working better).

If you're only using those libraries, nothing forces you to recompile them ...
except changing of compiler / architecture / OS or changing some #defines of the libraries

The question is a bit confusing, so to get this strait, you only need to recompile code that depends on your class.
Then for things that depend on your class:
Adding a constructor to your class introduces a new function, so if client code doesn't use that constructor - no recompilation is needed,
Adding data member changes memory layout of the class - recompilation is required,
Changing destructor to virtual changes/introduces vtable layout and function dispatch code - recompilation is required,
Adding argument with default value to existing member function changes number of arguments to that function (default arguments are substituted at the call site) - client code recompilation is required.

Changing any source code or classes that the library depends on should force a recompilation of the library. A good build tool, with the dependencies set up correctly, will handle this automagically during the build process.

Related

Can I use a slim version of my header to be included with the library?

What I mean is my real header file can look like this:
#include "some_internal_class.h"
class MyLibrary {
Type private_member;
void private_function();
public:
MyLibrary();
void function_to_be_called_by_library_users();
};
Now I want to produce a dynamic library containing all the necessary definitions. and I want to ship with it a single header instead of shipping every single header I have in my library.
So I was thinking I could create a slim version of my header like so:
class MyLibrary {
public:
MyLibrary();
void function_to_be_called_by_library_users();
};
Headers are just declarations anyway right? they're never passed to the compiler. And I've declared what the user will be using.
Is that possible? If not, why not?
This is a One Definition Rule violation. The moment you deviate by a single token.
[basic.def.odr]/6
There can be more than one definition of a class type, [...] in a
program provided that each definition appears in a different
translation unit, and provided the definitions satisfy the following
requirements. Given such an entity named D defined in more than one
translation unit, then
each definition of D shall consist of the same sequence of tokens; and
Your program may easily break if you violate the ODR like that. And your build system isn't at all obligated to even warn you about it.
You cannot define a class twice. It breaks the One Definition Rule (ODT). MyLibrary does that, unfortunately.
they're never passed to the compiler
They will. Members of a class must be known at compile time, so that the compiler can determine the class's size.
Header are just declarations anyway right? they're never passed to the
compiler. And I've declared what the user will be using.
No. Headers are part of source code and are compiled together with source files. They contain the information necessary for a compiler to understand how to work with code (in your case, with class MyLibrary).
As an example, you want library users to be able to create objects of class MyLibrary, so you export the constructor. However, this is not sufficient: the compiler needs to know the size of the object to be created, which is impossible unless you specify all the fields.
In practice, deciding what to expose to library users and what to hide as implementation details is a hard question, which requires detailed inspection of the library usage and semantics. If you really want to hide the class internals as implementation detail, here are some common options:
The pimpl idiom is a common solution. It enables you to work with the class as it is usually done, but the implementation details are nicely hidden.
Extract the interface into an abstract class with virtual functions, and use pointers (preferably smart pointers) to work with the objects.
Headers are just declarations anyway right? they're never passed to the compiler.
The moment you do a #include to a file, its content are copied and pasted into your source file exactly as they are.
So even though you don't pass them directly as compiler arguments, they're still part of your code and code in them will be compiled into your translation units.
Solutions by #lisyarus are pretty good.
But another option would be doing it the C way. Which is the most elegant in my opinion.
In C you give your users a handle, which will most likely be a pointer.
Your header would look something like this:
struct MyLibrary;
MyLibrary*
my_library_init();
void
my_library_destroy(MyLibrary*);
void
my_library_function_to_be_called_by_library_users(MyLibrary*);
A very small and simple interface that does not show your users anything you don't want them to see.
Another nice perk is that your build system will not have to recompile your whole program just because you added a field to the MyLibrary struct.
You have to watch out though, because now you have to call my_library_destroy which will carry the logic of your destructor.

Modification of base class needs subclases recompilation?

suppose I have a base class and a derived class. I have compiled my program and running it. Now suppose I want to do some changes in my base class.
My questions are:
Question 1: If i separately do the changes in base class file only and recompile only base class then whether the changes will reflect in derived class objects also which are already instantiated or do i need to recompile derived class also. Other way of asking this question could be whether the copy is created to base class member functions or pointers are stored so that changes automatically gets reflected ??
Question 2: If not updated automatically, then is there any way to do this ??
C++ doesn't have reflection, so you need to recompile the whole thing.
This isn't well-defined by the language (since it doesn't address dynamic linking), but we can lay out some cases that may work, and some that almost certainly won't.
Should work:
changing a non-inlined function body in base.cpp
so long as cross-module/link-time inlining isn't enabled
assuming the derived class doesn't depends only on the interface and not on changed behaviour
adding static or non-virtual methods
beware of changing overload resolution though
Likely to fail horribly:
changing the prototype of any method or constructor used in the derived class
this includes changes that wouldn't normally be visible in calling code (such as adding defaulted arguments) or changing the type of an argument even where there is an implicit conversion, etc.
adding, removing or re-ordering virtual methods
adding, removing or re-ordering data members or base classes
There are several assumptions underlying these:
your base class and derived class are in separate dynamic libs (eg. base.so and derived.so). This isn't clear from your question
the only reasons for runtime compatibility to break are
because the layout in memory changed (ie, you modified the base-class instance size, or the vtable size, or the size or order of members)
because the code that would be generated at a call site changed (ie, because you added arguments with default values, or changed an argument to an implicitly-convertible type)
If the first assumption isn't true, the question seems pointless, since you'll have to rebuild the whole lib anyway.
The second assumption could be broken if you change or upgrade your compiler, or change the compiler flags, or the version of any other library you link against.
With respect to inlining, you can get horribly subtle bugs if different dynamic libs have inlined different versions of the code. You really won't enjoy trying to debug those.
C++ is a statically compiled language. That means that every type checking is done at compile time, so if you modify a type, every line of code that depends on the modification must be recompiled. It includes base class modification and subclases, as in your case.
Note that it could be a problem, because if you are writting an API, and you modify the API implementation, the API and every code that uses the code you have modified (The user code) must be recompiled.
The classic thechnique to reduce recompilation is the PIMPL idiom.
PIMPL hides the implementation of the class through a pointer to a implementation class stored as member of the original class. Note that the original class acts only as a interface. So if the implementation is modified, the interface not, so users of the class not need to recompile.

Is it safe to use strings as private data members in a class used across a DLL boundry?

My understanding is that exposing functions that take or return stl containers (such as std::string) across DLL boundaries can cause problems due to differences in STL implementations of those containers in the 2 binaries. But is it safe to export a class like:
class Customer
{
public:
wchar_t * getName() const;
private:
wstring mName;
};
Without some sort of hack, mName is not going to be usable by the executable, so it won't be able to execute methods on mName, nor construct/destruct this object.
My gut feeling is "don't do this, it's unsafe", but I can't figure out a good reason.
It is not a problem. Because it is trumped by the bigger problem, you cannot create an object of that class in code that lives in a module other than the one that contains the code for the class. Code in another module cannot accurately know the required object size, their implementation of the std::string class may well be different. Which, as declared, also affects the size of the Customer object. Even the same compiler cannot guarantee this, mixing optimized and debugging builds of these modules for example. Albeit that this is usually pretty easy to avoid.
So you must create a class factory for Customer objects, a factory that lives in that same module. Which then automatically implies that any code that touches the "mName" member also lives in the same module. And is therefore safe.
Next step then is to not expose Customer at all but expose an pure abstract base class (aka interface). Now you can prevent the client code from creating an instance of Customer and shoot their leg off. And you'll trivially hide the std::string as well. Interface-based programming techniques are common in module interop scenarios. Also the approach taken by COM.
As long as the allocator of instances of the class and deallocator are of the same settings, you should be ok, but you are right to avoid this.
Differences between the .exe and .dll as far as debug/release, code generation (Multi-threaded DLL vs. Single threaded) could cause problems in some scenarios.
I would recommend using abstract classes in the DLL interface with creation and deletion done solely inside the DLL.
Interfaces like:
class A {
protected:
virtual ~A() {}
public:
virtual void func() = 0;
};
//exported create/delete functions
A* create_A();
void destroy_A(A*);
DLL Implementation like:
class A_Impl : public A{
public:
~A_Impl() {}
void func() { do_something(); }
}
A* create_A() { return new A_Impl; }
void destroy_A(A* a) {
A_Impl* ai=static_cast<A_Impl*>(a);
delete ai;
}
Should be ok.
Even if your class has no data members, you cannot expect it to be usable from code compiled with a different compiler. There is no common ABI for C++ classes. You can expect differences in name mangling just for starters.
If you are prepared to constrain clients to use the same compiler as you, or provide source to allow clients to compile your code with their compiler, then you can do pretty much anything across your interface. Otherwise you should stick to C style interfaces.
If you want to provide an object oriented interface in a DLL that is truly safe, I would suggest building it on top of the COM object model. That's what it was designed for.
Any other attempt to share classes between code that is compiled by different compilers has the potential to fail. You may be able to get something that seems to work most of the time, but it can't be guaraneteed to work.
The chances are that at some point you're going to be relying on undefined behaviour in terms of calling conventions or class structure or memory allocation.
The C++ standard does not say anything about the ABI provided by implementations. Even on a single platform changing the compiler options may change binary layout or function interfaces.
Thus to ensure that standard types can be used across DLL boundaries it is your responsibility to ensure that either:
Resource Acquisition/Release for standard types is done by the same DLL. (Note: you can have multiple crt's in a process but a resource acquired by crt1.DLL must be released by crt1.DLL.)
This is not specific to C++. In C for example malloc/free, fopen/fclose call pairs must each go to a single C runtime.
This can be done by either of the below:
By explicitly exporting acquisition/release functions ( Photon's answer ). In this case you are forced to use a factory pattern and abstract types.Basically COM or a COM-clone
Forcing a group of DLL's to link against the same dynamic CRT. In this case you can safely export any kind of functions/classes.
There are also two "potential bug" (among others) you must take care, since they are related to what is "under" the language.
The first is that std::strng is a template, and hence it is instantiated in every translation unit. If they are all linked to a same module (exe or dll) the linker will resolve same functions as same code, and eventually inconsistent code (same function with different body) is treated as error.
But if they are linked to different module (and exe and a dll) there is nothing (compiler and linker) in common. So -depending on how the module where compiled- you may have different implementation of a same class with different member and memory layout (for example one may have some debugging or profiling added features the other has not). Accessing an object created on one side with methods compiled on the other side, if you have no other way to grant implementation consistency, may end in tears.
The second problem (more subtle) relates to allocation/deallocaion of memory: because of the way windows works, every module can have a distinct heap. But the standard C++ does not specify how new and delete take care about which heap an object comes from. And if the string buffer is allocated on one module, than moved to a string instance on another module, you risk (upon destruction) to give the memory back to the wrong heap (it depends on how new/delete and malloc/free are implemented respect to HeapAlloc/HeapFree: this merely relates to the level of "awarness" the STL implementation have respect to the underlying OS. The operation is not itself destructive -the operation just fails- but it leaks the origin's heap).
All that said, it is not impossible to pass a container. It is just up to you to grant a consistent implementation between the sides, since the compiler and linker have no way to cross check.

Implications of using std::vector in a dll exported function

I have two dll-exported classes A and B. A's declaration contains a function which uses a std::vector in its signature like:
class EXPORT A{
// ...
std::vector<B> myFunction(std::vector<B> const &input);
};
(EXPORT is the usual macro to put in place _declspec(dllexport)/_declspec(dllimport) accordingly.)
Reading about the issues related to using STL classes in a DLL interface, I gather in summary:
Using std::vector in a DLL interface would require all the clients of that DLL to be compiled with the same version of the same compiler because STL containers are not binary compatible. Even worse, depending on the use of that DLL by clients conjointly with other DLLs, the ''instable'' DLL API can break these client applications when system updates are installed (e.g. Microsoft KB packages) (really?).
Despite the above, if required, std::vector can be used in a DLL API by exporting std::vector<B> like:
template class EXPORT std::allocator<B>;
template class EXPORT std::vector<B>;
though, this is usually mentioned in the context when one wants to use std::vector as a member of A (http://support.microsoft.com/kb/168958).
The following Microsoft Support Article discusses how to access std::vector objects created in a DLL through a pointer or reference from within the executable (http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q172396). The above solution to use template class EXPORT ... seems to be applicable too. However, the drawback summarized under the first bullet point seems to remain.
To completely get rid of the problem, one would need to wrap std::vector and change the signature of myFunction, PIMPL etc..
My questions are:
Is the above summary correct, or do I miss here something essential?
Why does compilation of my class 'A' not generate warning C4251 (class 'std::vector<_Ty>' needs to have dll-interface to be used by clients of...)? I have no compiler warnings turned off and I don't get any warning on using std::vector in myFunction in exported class A (with VS2005).
What needs to be done to correctly export myFunction in A? Is it viable to just export std::vector<B> and B's allocator?
What are the implications of returning std::vector by-value? Assuming a client executable which has been compiled with a different compiler(-version). Does trouble persist when returning by-value where the vector is copied? I guess yes. Similarly for passing std::vector as a constant reference: could access to std::vector<B> (which might was constructed by an executable compiled with a different compiler(-version)) lead to trouble within myFunction? I guess yes again..
Is the last bullet point listed above really the only clean solution?
Many thanks in advance for your feedback.
Unfortunately, your list is very much spot-on. The root cause of this is that DLL-to-DLL or DLL-to-EXE is defined on the level of the operating system, while the the interface between functions is defined on the level of a compiler. In a way, your task is similar (although somewhat easier) to that of client-server interaction, when the client and the server lack binary compatibility.
The compiler maps what it can to the way the DLL importing and exporting is done in a particular operating system. Since language specifications give compilers a lot of liberty when it comes to binary layout of user-defined types and sometimes even built-in types (recall that the exact size of int is compiler-dependent, as long as minimal sizing requirements are met), importing and exporting from DLLs needs to be done manually to achieve binary-level compatibility.
When you use the same version of the same compiler, this last issue above does not create a problem. However, as soon as a different compiler enters the picture, all bets are off: you need to go back to the plainly-typed interfaces, and introduce wrappers to maintain nice-looking interfaces inside your code.
I've been having the same problem and discovered a neat solution to it.
Instead of passing std:vector, you can pass a QVector from the Qt library.
The problems you quote are then handled inside the Qt library and you do not need to deal with it at all.
Of course, the cost is having to use the library and accept its slightly worse performance.
In terms of the amount of coding and debugging time it saves you, this solution is well worth it.

C++ -- When recompilation is required

You have a class that many libraries depend on. You need to modify the class for one application. Which of the following changes require recompiling all libraries before it is safe to build the application?
add a constructor
add a data member
change destructor into virtual
add an argument with default value to an existing member function
Classes are defined in the header file. The header file will be compiled into both the library that implements the class and the code that uses the class. I am assuming that you are taking as a given that you will need to recompile the class implementation after changing the class header file and that the question you are asking is whether you will need to recompile any code that references the class.
The problem that you are describing is one of binary compatibility (BC) and generally follows the following rules:
Adding non-virtual functions anywhere in the class does not break BC.
Changing any function definition (adding parameters )will break BC.
Adding virtual functions anywhere changes the v-table and therefore breaks BC.
Adding data members will break BC.
Changing a parameter from non-default to default will not break BC.
Any changes to inline functions will break BC (inline function should therefore be avoided if BC is important.)
Changing compiler (or sometimes even compiler versions) will probably break BC unless the compilers adhere strickly to the same ABI.
If BC is a major issue for the platform you are implementing it could well be a good idea to separate out the interface and implementation using the Bridge pattern.
As an aside, the C++ language does not deal with the Application Binary Interface (ABI). If binary compatibility is a major issue, you should probably refer to your platform's ABI specification for more details.
Edit: updated adding data members. This will break BC because more memory will now be needed for the class than before.
Strictly speaking, you end up in Undefined Behavior land as soon as you do not recompile for any of those reasons.
That said, in practice you might get away with a few of them:
add a constructor
Might be Ok to use as long as
it's not the first user-defined constructor to the class
it's not the copy constructor
add a data member
This changes the size of instances of the class. Might be Ok for anyone who just uses pointers or references, if you take care to put that data behind all other data, so that the offsets for accessing the other data members do not change. But the exact layout of sub objects in binary is not defined, so you will have to rely on a specific implementation.
change destructor into virtual
This changes the class' virtual table, so it needs recompilation.
add an argument with default value to an existing member function
Since default arguments are inserted at the call site, everyone using this needs to recompile. (However, using overloading instead of default arguments might allow you to get away with that.)
Note that any inlined member function could render any of the above wrong, since the code of those is directly embedded (and optimized) in the clients' code.
However, the safest bet would be to just recompile everything. Why is this an issue?
All of them need to recompile all the libraries that use the class. (provided they include the .h file)
sbi's answer is pretty good (and deserves to be voted up to top). However I think I can expand the "maybe ok" into something more concrete.
Add a constructor
If the constructor you've added is the default constructor (or indeed a copy constructor) then you have to be careful. If previously not available then they will have been automatically generated by the compiler (as such a recompilation is required to ensure they are using the actual constructor that has been implemented). For this reason I tend to always hide or define these constructors for classes that form some API.
By using ordinal export .def file to maintain Application Binary Interface, you can avoid client recompilation in many cases:
Add a constructor
Export this constructor function to
end of export table with largest
ordinal number. Any client code
doesn't call this constructor need
not compile.
Add a data member
This is a break if client code manipulates class object directly, not through pointer or reference.
Change destructor into virtual
This is probably a break, if your
class doesn't have any other virtual
function, which means now your class
has to add a vptr table and increase
class object size and change memory
layour. If your class has already
have a vptr table, moving destructor
to end of vptr table won't affect
object layout in terms of backward
compatibility. But if client class is derived from your class and has defined its own virtual function then it breaks. And also any client calling
original non-virtual destructor will break.
Add an argument with default value to
an existing member function
This is definitely a break.
I am clearly against #sbi answer: in general you do need to recompile. Only under much more strict circumstances than the ones he posted you may get away.
add a constructor
If the constructor added is either the default constructor or the copy constructor, any code that used the implicitly defined version of it and does not get recompiled will fail to initialize the object, and that means that invariants required by other methods will not be set at construction, i.e. the code will fail.
add a data member
This modifies the layout of the object. Even code that only used pointers or references need to be recompiled to adapt to the change in layout. If a member is added at the beginning of the object, any code that used any member of the object will be offset and fail.
struct test {
// int x; // added later
int y;
};
void foo( test * t ) {
std::cout << t->y << std::endl;
}
If foo was not recompiled, then after uncommenting x it would print t->x instead of t->y. If the types did not match it would even be worse. Theoretically, even if the added member is at the end of the object, if there are more than one access modifier the compiler is allowed to reorder members and hit the same issue.
change destructor to virtual
If it is the first virtual method it will change the layout of the object and get all of the previous issues plus the addition that deleting through a reference to the base will call the base destructor and not be dispatched to the correct method. In most compilers (with vtable support) it can imply a change in the memory layout of the vtable for the type, and that means that the wrong method can be called and cause havoc.
add an argument with default value
This is a change in function signature, all code that used the method before will need to be recompiled to adapt to the new signature.
As soon as you change anything in the header file (hpp file), you have to recompile everything that depends on it.
However if you change the source file (cpp file), you have to recompile just the library which contains needs definitions from this file.
The easy way to break physical dependencies, where all libraries in the upper tier needs to recompile is to use the pimpl idiom. Then, as long as you don't touch the header files, you just need to compile the library where the implementation is being modified.