Singleton and ABI stability - c++

Consider the following class
// in library
class A {
public:
static A* instance(){
static A self;
return &self;
}
void foo() { }
private:
A () {}
int a{0};
};
that is defined in dynamic library.
Then a new version of the class is defined with an extra field b.
class A {
// as before
private:
int b{0};
};
Is the new version of the library breaking the ABI ?
For a regular class, I would say yes undoubtedly, but for a singleton I think it's /not/ a ABI break (the application handles only pointers to A which didn't changed, nothing public changed). But I'm not sure and I would like a definitive answer.

If the application code ever needs a complete type for A in order to compile, then yes this would be an ABI break.
If the application merely stores a pointer to A (and calls into the library to obtain an A*) then there is no ABI break.
Comparisons can be made here with the FILE* pointer of C.
The fact that A is a singleton is a red herring.

Yes, it is.
First, with regards to guarantees by the standard, changing any single token in the class definition requires all translation units (including libraries) to use the new definition in order to not break the ODR and cause undefined behavior.
Obviously there are situations in practice where you can rely on the implementation-specific behavior to still make it work and not break ABI in the practical sense.
In the case you are showing, at the very least the instance function will be a problem. This function is an inline function and must know the actual exact layout of A. But because it is an inline function you cannot control that both the library and the library user will use the same version of this function. You can easily get a mismatch and then the object might be constructed with the wrong layout.
Even if you make instance non-inline, you will have the same issue with the constructor which is also inline in your code.

Related

C++ Inheritance and dynamic libraries

The idea is the following. I have a library Version 1 with a class that looks as follows:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
}
In Version 2, the class was modified to this:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
virtual void myMethod2(int p1, int p2);
}
//implementation of myMethod2 in cpp file
void MyClass::myMethod2(int p1, int p2)
{
myMethod(p1);
//...
}
Now imagine a user compiled againts Version 1 of the library, and extended MyClass by overriding myMethod. Now he updates the library to version 2, without recompiling. Let's further assume the dynamic linker still successfully finds the library and loads it.
The question is, if I call the method instance->myMethod2(1, 2); somwhere inside the library, will it work, or will the application crash? In both cases, the class has no members and thus is of the same size.
I don't think there is point to guess if that app will crash or not, behavior is undefined. The application has to be recompiled, since there was ABI change in the library.
When library calls instance->myMethod2(1, 2); it will have to go through virtual table that was created in the application code with the assumption that there is only one virtual method: myMethod. From that point, you get undefined behavior. In short, you have to recompile you application when library ABI changes.
KDE C++ ABI guidelines specifically prohibit such change. Virtual tables of derived classes will not contain addresses for new methods and so virtual calls of those methods on objects of derived classes will crash.
By changing the definition of the class without recompiling, you've violated the One Definition Rule. The user who did not recompile is using the old definition, while your library is using the new definition. This results in undefined behavior.
To see how this might manifest, consider the typical implementation of virtual functions which uses a VTable to dispatch function calls. The library user has derived a class, and this derived class has only one function in the VTable. If a pointer or reference to this class is passed into the library, and the library tries to call the second function, it will attempt to access a VTable entry that doesn't exist. This will almost always result in a crash, although nothing is guaranteed when it comes to undefined behavior.

Binary compatibility when using pass-by-reference instead of pass-by-pointer

This question is intended as a follow up question to this one: What are the differences between a pointer variable and a reference variable in C++?
Having read the answers and some further discussions I found on stackoverflow I know that the compiler should treat pass-by-reference the same way it treats pass-by-pointer and that references are nothing more than syntactic sugar. One thing I haven't been able to figure out yet if there is any difference considering binary compatibility.
In our (multiplatform) framework we have the requirement to be binary compatible between release and debug builds (and between different releases of the framework). In particular, binaries we build in debug mode must be usable with release builds and vice versa.
To achieve that, we only use pure abstract classes and POD in our interfaces.
Consider the following code:
class IMediaSerializable
{
public:
virtual tResult Serialize(int flags,
ISerializer* pSerializer,
IException** __exception_ptr) = 0;
//[…]
};
ISerializer and IException are also pure abstract classes. ISerializer must point to an existing object, so we always have to perform a NULL-pointer check. IException implements some kind of exception handling where the address the pointer points to must be changed. For this reason we use pointer to pointer, which must also be NULL-pointer checked.
To make the code much clearer and get rid of some unnecessary runtime checks, we would like to rewrite this code using pass-by-reference.
class IMediaSerializable
{
public:
virtual tResult Serialize(int flags,
ISerializer& pSerializer,
IException*& __exception_ptr) = 0;
//[…]
};
This seems to work without any flaws. But the question remains for us whether this still satisfies the requirement of binary compatibility.
UPDATE:
To clarify things up: This question is not about binary compatibility between the pass-by-pointer version of the code and the pass-by-reference version. I know this can't be binary compatible. In fact we have the opportunity to redesign our API for which we consider using pass-by-reference instead of pass-by-pointer without caring about binary compatibilty (new major release).
The question is just about binary compatibility when only using the pass-by-reference version of the code.
Binary ABI compatibility is determined by whatever compiler you are using. The C++ standard does not cover the issue of binary ABI compatibility.
You will need to check your C++ compiler's documentation, to see what it says about binary compatibility.
Generally references are implemented as pointers under-the-hood, so there will usually be ABI compatibility. You will have to check your particular compiler's documentation and possibly implementation to make sure.
However, your restriction to pure-abstract classes and POD types is over zealous in the age of C++11.
C++11 split the concept of pod into multiple pieces. Standard Layout covers most, if not all, of the "memory layout" guarantees of a pod type.
But Standard Layout types can have constructors and destructors (among other differences).
So you can make a really friendly interface.
Instead of a manually managed interface pointer, write a simple smart pointer.
template<class T>
struct value_ptr {
T* raw;
// ...
};
that ->clone()s on copy, moves the pointer on move, deletes on destroy, and (because you own it) can be guaranteed to be stable over compiler library revisions (while unique_ptr cannot). This is basically a unique_ptr that supports ->clone(). Also have your own unique_ptr for values that cannot be duplicated.
Now you can replace your pure virtual interfaces with a pair of types. First, the pure virtual interface (with a T* clone() const usually), and second a regular type:
struct my_regular_foo {
value_ptr< IFoo > ptr;
bool some_method() const { return ptr->some_method(); } // calls pure virtual method in IFoo
};
the end result is you have a type that behaves like a regular, everyday type, but it is implemented as a wrapper around a pure virtual interface class. Such types can be taken by value, taken by reference, and returned by value, and can hold arbitrary complex state within them.
These types live in header files that the library exposes.
And interface expansion of IFoo is fine. Just add a new method to both the IFoo at the end of the type (which under most ABIs is backward compatible (!) -- try it), then add a new method to my_regular_foo that forwards to it. As we did not change the layout of our my_regular_foo, even though the library and the client code may disagree about what methods it has, that is fine -- those methods are all compiled inline and never exported -- and clients who know they are using the newer version of your library can use it, and those who do not know but are using it are fine (without rebuilding).
There is one careful gotcha: if you add an overload to IFoo of a method (not an override: an overload) the order of the virtual methods changes, and if you add a new virtual parent the layout of the virtual table can change, and this only works reliably if all inheritance to your abstract classes is virtual in your public API (with virtual inheritance, the vtable has pointers to the start of each vtable of the sub-classes: so each sub-class can have a bigger vtable without messing up the address other functions virtual functions. And if you carefully only append to the end of a sub-class vtable code using the earlier header files can still find the earlier methods).
This last step -- allowing new methods on your interfaces -- might be a bridge to far, as you'd have to investigate the ABI guarantees (in practice and not) on vtable layout for every supported compiler.
No, it will not work regardless of which compiler you are using.
Consider a class Foo that exports two functions:
class Foo
{
public:
void f(int*);
void f(int&);
};
The compiler has to convert (mangles) the names of the two functions f to a ABI-specific string, so that the linker can distinguish between the two.
Now, since the compiler needs to support overload resolution, even if references were implemented exactly like pointers, the two function names will need to have a different mangled name.
For example GCC mangles these names to:
void Foo::f(int*) => _ZN3Foo1fEPi
void Foo::f(int&) => _ZN3Foo1fERi
Notice P vs R.
So if you change the signature of the function your application will fail to link.

Is it safe to use strings as private data members in a class used across a DLL boundry?

My understanding is that exposing functions that take or return stl containers (such as std::string) across DLL boundaries can cause problems due to differences in STL implementations of those containers in the 2 binaries. But is it safe to export a class like:
class Customer
{
public:
wchar_t * getName() const;
private:
wstring mName;
};
Without some sort of hack, mName is not going to be usable by the executable, so it won't be able to execute methods on mName, nor construct/destruct this object.
My gut feeling is "don't do this, it's unsafe", but I can't figure out a good reason.
It is not a problem. Because it is trumped by the bigger problem, you cannot create an object of that class in code that lives in a module other than the one that contains the code for the class. Code in another module cannot accurately know the required object size, their implementation of the std::string class may well be different. Which, as declared, also affects the size of the Customer object. Even the same compiler cannot guarantee this, mixing optimized and debugging builds of these modules for example. Albeit that this is usually pretty easy to avoid.
So you must create a class factory for Customer objects, a factory that lives in that same module. Which then automatically implies that any code that touches the "mName" member also lives in the same module. And is therefore safe.
Next step then is to not expose Customer at all but expose an pure abstract base class (aka interface). Now you can prevent the client code from creating an instance of Customer and shoot their leg off. And you'll trivially hide the std::string as well. Interface-based programming techniques are common in module interop scenarios. Also the approach taken by COM.
As long as the allocator of instances of the class and deallocator are of the same settings, you should be ok, but you are right to avoid this.
Differences between the .exe and .dll as far as debug/release, code generation (Multi-threaded DLL vs. Single threaded) could cause problems in some scenarios.
I would recommend using abstract classes in the DLL interface with creation and deletion done solely inside the DLL.
Interfaces like:
class A {
protected:
virtual ~A() {}
public:
virtual void func() = 0;
};
//exported create/delete functions
A* create_A();
void destroy_A(A*);
DLL Implementation like:
class A_Impl : public A{
public:
~A_Impl() {}
void func() { do_something(); }
}
A* create_A() { return new A_Impl; }
void destroy_A(A* a) {
A_Impl* ai=static_cast<A_Impl*>(a);
delete ai;
}
Should be ok.
Even if your class has no data members, you cannot expect it to be usable from code compiled with a different compiler. There is no common ABI for C++ classes. You can expect differences in name mangling just for starters.
If you are prepared to constrain clients to use the same compiler as you, or provide source to allow clients to compile your code with their compiler, then you can do pretty much anything across your interface. Otherwise you should stick to C style interfaces.
If you want to provide an object oriented interface in a DLL that is truly safe, I would suggest building it on top of the COM object model. That's what it was designed for.
Any other attempt to share classes between code that is compiled by different compilers has the potential to fail. You may be able to get something that seems to work most of the time, but it can't be guaraneteed to work.
The chances are that at some point you're going to be relying on undefined behaviour in terms of calling conventions or class structure or memory allocation.
The C++ standard does not say anything about the ABI provided by implementations. Even on a single platform changing the compiler options may change binary layout or function interfaces.
Thus to ensure that standard types can be used across DLL boundaries it is your responsibility to ensure that either:
Resource Acquisition/Release for standard types is done by the same DLL. (Note: you can have multiple crt's in a process but a resource acquired by crt1.DLL must be released by crt1.DLL.)
This is not specific to C++. In C for example malloc/free, fopen/fclose call pairs must each go to a single C runtime.
This can be done by either of the below:
By explicitly exporting acquisition/release functions ( Photon's answer ). In this case you are forced to use a factory pattern and abstract types.Basically COM or a COM-clone
Forcing a group of DLL's to link against the same dynamic CRT. In this case you can safely export any kind of functions/classes.
There are also two "potential bug" (among others) you must take care, since they are related to what is "under" the language.
The first is that std::strng is a template, and hence it is instantiated in every translation unit. If they are all linked to a same module (exe or dll) the linker will resolve same functions as same code, and eventually inconsistent code (same function with different body) is treated as error.
But if they are linked to different module (and exe and a dll) there is nothing (compiler and linker) in common. So -depending on how the module where compiled- you may have different implementation of a same class with different member and memory layout (for example one may have some debugging or profiling added features the other has not). Accessing an object created on one side with methods compiled on the other side, if you have no other way to grant implementation consistency, may end in tears.
The second problem (more subtle) relates to allocation/deallocaion of memory: because of the way windows works, every module can have a distinct heap. But the standard C++ does not specify how new and delete take care about which heap an object comes from. And if the string buffer is allocated on one module, than moved to a string instance on another module, you risk (upon destruction) to give the memory back to the wrong heap (it depends on how new/delete and malloc/free are implemented respect to HeapAlloc/HeapFree: this merely relates to the level of "awarness" the STL implementation have respect to the underlying OS. The operation is not itself destructive -the operation just fails- but it leaks the origin's heap).
All that said, it is not impossible to pass a container. It is just up to you to grant a consistent implementation between the sides, since the compiler and linker have no way to cross check.

Splitting long method maintaining class interface

In my library there's a class like this:
class Foo {
public:
void doSomething();
};
Now, implementation of doSomething() has been grow a lot and I want to split it in two methods:
class Foo {
public:
void doSomething();
private:
void doSomething1();
void doSomething2();
};
Where doSomething() implementation is this:
void Foo::doSomething() {
this->doSomething1();
this->doSomething2();
}
But now class interface has changed. If I compile this library, all existent applications using this library wont work, external linkage is changed.
How can I avoid breaking of binary compatibility?
I guess inlining solves this problem. Is it right? And is it portable? What happen if compiler optimization uninlines these methods?
class Foo {
public:
void doSomething();
private:
inline void doSomething1();
inline void doSomething2();
};
void Foo::doSomething1() {
/* some code here */
}
void Foo::doSomething2() {
/* some code here */
}
void Foo::doSomething() {
this->doSomething1();
this->doSomething2();
}
EDIT:
I tested this code before and after method splitting and it seems to maintain binary compatibility. But I'm not sure this would work in every OS and every compiler and with more complex classes (with virtual methods, inheritance...). Sometimes I had binary compatibility breaking after adding private methods like these, but now I don't remember in which particular situation. Maybe it was due to symbol tabled looked by index (like Steve Jessop notes in his answer).
Strictly speaking, changing the class definition at all (in either of the ways you show) is a violation of the One Definition Rule and leads to undefined behavior.
In practice, adding non-virtual member functions to a class maintains binary compatibility in every implementation out there, because if it didn't then you'd lose most of the benefits of dynamic libraries. But the C++ standard doesn't say much (anything?) about dynamic libraries or binary compatibility, so it doesn't guarantee what changes you can make.
So in practice, changing the symbol table doesn't matter provided that the dynamic linker looks up entries in the symbol table by name. There are more entries in the symbol table than before, but that's OK because all the old ones still have the same mangled names. It may be that with your implementation, private and/or inline functions (or any functions you specify) aren't dll-exported, but you don't need to rely on that.
I have used one system (Symbian) where entries in the symbol table were not looked up by name, they were looked up by index. On that system, when you added anything to a dynamic library you had to ensure that any new functions were added to the end of the symbol table, which you did by listing the required order in a special config file. You could ensure that binary compatibility wasn't broken, but it was fairly tedious.
So, you could check your C++ ABI or compiler/linker documentation to be absolutely sure, or just take my word for it and go ahead.
There is no problem here. The name mangling of Foo::doSomething() is always the same regardless of it's implementation.
I think the ABI of the class won't change if you add non-virtual methods because non-virtual methods are not stored in the class object, but rather as functions with mangled names. You can add as many functions as you like as long as you don't add class members.

Objects of a class share same code segment for methods?

For example we have code
class MyClass
{
private:
int data;
public:
int getData()
{
return data;
}
};
int main()
{
MyClass A, B, C;
return 0;
}
Since A, B and C are objects of MyClass, all have their own memory.
My question is that, are all of these objects share same memory for methods of class ( getData() in this case) or all objects have separate code segment for each object.?
Tnahks in advance....
The C++ Standard has nothing to say on the subject. If your architecture supports multiple code segments, then whether multiple segments are used is down to the implementation of the compiler and linker you are using. It's highly unlikely that any implementation would create separate segments for each class or object, however. Or indeed produce separate code for each object - methods belong to classes, not individual objects.
They usually share the same code segment.
Same.
You could be interested in knowledge of how things in C++ are implemented under the hood.
In general with classes and objects the following is how it works:
A class is a description of data and operations on that data.
An object is a specification of the type of object (the class it represents) and the values of the attributes. Because the type of the object is saved the compiler will know where to find the methods that are being called on the object. So even though a new copy of the data is created when a new object is made, the methods still are fixed.
In your example, MyClass::getData() is 'inline', so in that case the location of the instructions for each instance may well be in different locations under some circumstances.
Most compilers only truly in-line code when optimisation is enabled (and even then may choose not to do so). However, if this in-line code were defined in a header file, the compiler would necessarily generate code for each compilation unit in which the class were used, even if it were not in-lined within the compilation unit. The linker may or may not then optimise out the multiple instances of the code.
For code not defined inline all instances will generally share the same code unless the optimiser decides to inline the code; which unless the linker is very smart, will only happen when the class is instantiated and used in the same compilation unit in which it was defined.