What can be changed in .so library without breaking compatibility

What can be changed in .so library without breaking compatibility - c++

For example can I add new function in header file without the need to recompile all programs using that library?

You can add functions and objects to a shared library without breaking existing programs that rely on that library. Under some circumstances you could increase the size of objects (in particular, arrays) in the library.
You can also replace the implementation of a function, provided that the function signature does not change. This will not cause any problems with dynamic linking, but if the new implementation's behavior does not meet existing programs' expectations then you'll see program misbehavior.
You can remove functions and objects that no program links to. If you're concerned only with existing programs then you may be able to catalog what functions and objects those are, but otherwise you can only base such an evaluation on the visibility of the functions / objects within the shared library -- externally-visible functions and objects cannot safely be removed.
There may be other implementation-specific details of a shared library that can be changed without breaking compatibility as well.
Note, however, that none of that has anything directly to do with header files. Compatibility of shared libraries is primarily a run time consideration. Header files are relevant only at compile time.

Another point is that you have to be very careful of any shared structures. If a function in your library accepts or returns a structure or a pointer to a structure, and if you make any changes to that structure (adding, removing, or rearranging members), you're likely to introduce incompatibility.
(Strictly speaking, changes like this do count as changes to the function's signature, as mentioned by others.)
If you're very, very careful, you can arrange to add new members at the end of a structure, but it generally requires explicit cooperation by callers, using mechanisms defined in advance (that is, adhered to since version 0, by all calling code, before any changes get made).

Related

Designing Interfaces in c++

I am developing an interface that can be used as dynamic loading.Also it should be compiler independent.So i wanted to export Interfaces.
I am facing now the following problems..
Problem 1: The interface functions are taking some custom data types (basically classes or structures) as In\Out parameters.I want to initialise members of these classes with default values using constructors.If i do this it is not possible to load my library dynamically and it becomes compiler dependent. How to solve this.
Problem 2: Some interfaces returns lists(or maps) of element to client.I am using std containers for this purpose.But this also once again compiler dependent(and compiler version also some times).
Thanks.

Code compiled differently can only work together if it adopts the same Application Binary Interface (ABI) for the set of types used for parameters and return value. ABI's are significant at a much deeper level - name mangling, virtual dispatch tables etc., but my point's that if there's one your compilers support allowing calling of functions with simple types, you can at least think about hacking together some support for more complex types like compiler-specific implementations of Standard containers and user-defined types.
You'll have to research what ABI support your compilers provide, and infer what you can about what they'll continue to provide.
If you want to support other types beyond what the relevant ABI standardises, options include:
use simpler types to expose internals of more complex types
pass [const] char* and size_t extracted by my_std_string.data() or &my_std_string[0] and my_std_string.size(), similarly for std::vector
serialise the data and deserialise it using the data structures of the receiver (can be slow)
provide a set of function pointers to simple accessor/mutator functions implemented by the object that created the data type
e.g. the way the classic C qsort function accepts a pointer to an element comparison function

As I usually have a multithreading focus, I'm mostly going to bark about your second problem.
You already realized that passing elements of a container over an API seems to be compiler dependent. It's actually worse: it's header file & C++-library dependent, so at least for Linux you're already stuck with two different sets: libstc++ (originating from gcc) and libcxx (originating from clang).
Because part of the containers is header files and part is library code, getting things ABI-independent is close to impossible.
My bigger worry is that you actually thought of passing container elements around. This is a huge threadsafety issue: the STL containers are not threadsafe - by design.
By passing references over the interface, you are passing "pointers to encapsulated knowledge" around - the users of your API could make assumptions of your internal structures and start modifying the data pointed to. That is usually already really bad in a singlethreaded environment, but gets worse in a multithreaded environment.
Secondly, pointers you provided could get stale, not good either.
Make sure to return copies of your inner knowledge to prevent user modification of your structures.
Passing things const is not enough: const can be cast away and you still expose your innards.
So my suggestion: hide the data types, only pass simple types and/or structs that you fully control (i.e. are not dependent on STL or boost).

Designing an API with the widest ABI compatibility is an extremely complex subject, even more so when C++ is involved instead of C.
Yet there are more theoretical-type issues that aren't really quite as bad as they sound in practice. For example, in theory, calling conventions and structure padding/alignment sound like they could be major headaches. In practice they aren't so much, and you can even resolve such issues in hindsight by specifying additional build instructions to third parties or decorating your SDK functions with macros indicating the appropriate calling convention. By "not so bad" here, I mean that they can trip you up but they won't have you going back to the drawing board and redesigning your entire SDK in response.
The "practical" issues I want to focus on are issues that can have you revisiting the drawing board and redoing the entire SDK. My list is also not exhaustive, but are some of the ones I think you should really keep in mind first.
You can also treat your SDK as consisting of two parts: a dynamically-linked part that actually exports functionality whose implementation is hidden from clients, and a statically (internally) linked convenience library part that adds C++ wrappers on top. If you treat your SDK as having these two distinct parts, you're allowed a lot more liberty in the statically-linked library to use a lot more C++ mechanisms.
So, let's get started with those practical headache inducers:
1. The binary layout of a vtable is not necessarily consistent across compilers.
This is, in my opinion, one of the biggest gotchas. We're usually looking at 2 main ways to access functionality from one module to another at runtime: function pointers (including those provided by dylib symbol lookup) and interfaces containing virtual functions. The latter can be so much more convenient in C++ (both for implementor and client using the interface), yet unfortunately using virtual functions in an API that aims to be binary compatible with the widest range of compilers is like playing minesweeper through a land of gotchas.
I would recommend avoiding virtual functions outright for this purpose unless your team consists of minesweeper experts who know all of these gotchas. It's useful to try to fall in love with C again for those public interface parts and start building a fondness for these kinds of interfaces consisting of function pointers:
struct Interface
{
void* opaque_private_data;
void (*func1)(struct Interface* self, ...);
void (*func2)(struct Interface* self, ...);
void (*func3)(struct Interface* self, ...);
};
These present far fewer gotchas and are nowhere near as fragile against changes (ex: you're perfectly allowed to do things like add more function pointers to the bottom of the structure without affecting ABI).
2. Stub libs for dylib symbol lookup are linker-specific (as are all static libs in general).
This might not seem like a big deal until combined with #1. When you toss out virtual functions for the purpose of exporting interfaces, then the next big temptation is to often export whole classes or select methods through a dylib.
Unfortunately doing this with manual symbol lookup can become very unwieldy very quickly, so the temptation is to often do this automatically by simply linking to the appropriate stub.
Yet this too can become unwieldy when your goal is to support as many compilers/linkers as possible. In such a case, you may have to possess many compilers and build and distribute different stubs for each possibility.
So this can kind of push you into a corner where it's no longer very practical export class definitions anymore. At this point you might simply export free-standing functions with C linkage (to avoid C++ name mangling which is another potential source of headaches).
One of the things that should be obvious already is that we're getting nudged more and more towards favoring a C or C-like API if our goal is universal binary compatibility without opening up too many cans of worms.
3. Different modules have 'different heaps'.
If you allocate memory in one module and try to deallocate it in another, then you're trying to free memory from a mismatching heap and will invoke undefined behavior.
Even in plain old C, it's easy to forget this rule and malloc in one exported function only to return a pointer to it with the expectation that the client accessing the memory from a different module will free it when done. This once again invokes undefined behavior, and we have to export a second function to indirectly free the memory from the same module that allocated it.
This can become a much bigger gotcha in C++ where we often have class templates that have internal linkage that implicitly do memory management. For example, even if we roll our own std::vector-like sequence like List<T>, we can run into a scenario where a client creates a list, passes it to our API by reference where we use functions that can allocate/deallocate memory (like push_back or insert) and butt heads with this mismatching heap/free store issue. So even this hand-rolled container should ensure that it allocates and deallocates memory from the same central location if it's going to be passed around across modules, and placement new will become your friend when implementing such containers.
4. Passing/returning C++ standard objects is not ABI-compatible.
This includes C++ standard containers as you have already guessed. There's no really practical way to ensure that one compiler will use a compatible representation of something like std::vector when including <vector> as another. So passing/returning such standard objects whose representation is outside of your control is generally out of the question if you're targeting wide binary compatibility.
These don't even necessarily have compatible representations within two projects built by the same compiler, as their representations can vary in incompatible ways based on build settings.
This might make you think that you should now roll all kinds of containers by hand, but I would suggest a KISS approach here. If you're returning a variable number of elements as a result from a function, then we don't need a wide range of container types. We only need one dynamic array kind of container, and it doesn't even have to be a growable sequence, just something with proper copy, move, and destruction semantics.
It might seem nicer and could save some cycles if you just returned a set or a map in a function that computes one, but I'd suggest forgetting about returning these more sophisticated structures and convert to/from this basic dynamic array kind of representation. It's rarely the bottleneck you might think it would be to transfer to/from contiguous representations, and if you actually do run into a hotspot as a result of this which you actually gained from a legit profiling session of a real world use case, then you can always add more to your SDK in a very discrete and selective fashion.
You can also always wrap those more sophisticated containers like map into a C-like function pointer interface that treats the handle to the map as opaque, hidden away from clients. For heftier data structures like a binary search tree, paying the cost of one level of indirection is generally very negligible (for simpler structures like a random-access contiguous sequence, it generally isn't quite as negligible, especially if your read operations like operator[] involve indirect calls).
Another thing worth noting is that everything I've discussed so far relates to the exported, dynamically-linked side of your SDK. The static convenience library that is internally linked is free to receive and return standard objects to make things convenient for the third party using your library, provided that you're not actually passing/returning them in your exported interfaces. You can even avoid rolling your own containers outright and just take a C-style mindset to your exported interfaces, returning raw pointers to T* that needs to be freed while your convenience library does that automatically and transfers the contents to std::vector<T>, e.g.
5. Throwing exceptions across module boundaries is undefined.
We should generally not be throwing exceptions from one module to be caught in another when we cannot ensure compatible build settings in the two modules, let alone the same compiler. So throwing exceptions from your API to indicate input errors is generally out of the question in this case.
Instead we should catch all possible exceptions at the entry points to our module to avoid leaking them into the outside world, and translate all such exceptions into error codes.
The statically-linked convenience library can still call one of your exported functions, check the error code, and in the case of failure, throw an exception. This is perfectly fine here since that convenience library is internally linked to the module of the third party using this library, so it's effectively throwing the exception from the third party module to be caught by the same third party module.
Conclusion
While this is, by no means, an exhaustive list, these are some caveats that can, when unheeded, cause some of the biggest issues at the broadest level of your API design. These kinds of design-level issues can be exponentially more expensive to fix in hindsight than implementation-type issues, so they should generally have the highest priority.
If you're new to these subjects, you can't go too far wrong favoring a C or very C-like API. You can still use a lot of C++ implementing it and can also build a C++ convenience library back on top (your clients don't even have to use anything but the C++ interfaces provided by that internally-linked convenience library).
With C, you're typically looking at more work at the baseline level, but potentially far fewer of those disastrous design-level gotchas. With C++, you're looking at less work at the baseline level, but far more potentially disastrous surprise scenarios. If you favor the latter route, you generally want to ensure that your team's expertise with ABI issues is higher with a larger coding standards document dedicating large sections to these potential ABI gotchas.
For your specific questions:
Problem 1: The interface functions are taking some custom data types
(basically classes or structures) as In\Out parameters.I want to
initialise members of these classes with default values using
constructors.If i do this it is not possible to load my library
dynamically and it becomes compiler dependent. How to solve this.
This is where that statically-linked convenience library can come in handy. You can statically link all that convenient code like a class with constructors and still pass in its data in a more raw, primitive kind of form to the exported interfaces. Another option is to selectively inline or statically link the constructor so that its code is not exported as with the rest of the class, but you probably don't want to be exporting classes as indicated above if your goal is max binary compatibility and don't want too many gotchas.
Problem 2: Some interfaces returns lists(or maps) of element to
client.I am using std containers for this purpose.But this also once
again compiler dependent(and compiler version also some times).
Here we have to forgo those standard container goodies at least at the exported API level. You can still utilize them at the convenience library level which has internal linkage.

Making objects shared across DLL's

I wrote a modular system with DLL's (or .so's, it's cross-platform). These separate plugins have to hook into the main system's objects.
E.G.
You have a vector of strings in one object, the owner is the main application. Now several (explicit loaded) plugins must access that vector.
These plugins need also a baseclass, a sort of API to hook into the main system.
I tried including all the header files into the DLL, but there are two problems with that: the I include the API each time I build an plugin.
The second is: for static methods, I need a cpp file, but then he doesn't use the cpp file from the main application, but his own. This causes problem with that shared vector... (since there are now multiple vectors, one main, and one per plugin)
Any ideas?

This is a bit problematic as Paul mentioned. When accessing a vector (probably you're talking about std::vector) which was instantiated in another module (exe/dll), you have to be careful. If both modules are compiled with the same compiler, same compiler settings, same STL implementation and you're only reading from it (especially not pushing additional values, as that may reallocate the internal buffer and cause problems when accessing the memory), you should be fine.
However if you cannot guarantee such setup, it would be better to provide access to raw data array together with its size, not via std::vector directly, but rather via the std::vector::data(). If the stored data type (template type T of std::vector) is binary compatible between the two communicating modules, such interface should be safe.
There is a C++ standard proposal to fix such issues: Defining a Portable C++ ABI. Hopefully it will prevent such problems in the future.

The effect of number of functions in a C++ library

Suppose now two C++ libraries are available: one library has all the functions that will be needed by the program ( a C++ application program that will invoke the library), and the other one not only has the necessary functions that will be needed by the program but also has other functions that will not not used by the program. We assume that for the common functions in both libraries they are implemented in the same manner. My question is: when the program uses the library to perform a certain task, what's the effect of the library on the performance of the program?
The reason why I asked this question is because when developing a c++ library I often wrote some additional functions, which may not be invoked by the users of the library but are important for debugging. When the library is finished, I have two choices: one is to keep these auxiliary functions and the other is removing them or using other strategies of keep them (for example, define MACRO to disable these functions). If keeping these auxiliaries functions will not deteriorate the performance, I would like to keep them.

Everything else being the same, there will be no performance difference.
In addition, if the library is a static library, the linker will not include the functions that are not used, and the executables will have the same size.

Well if you have written a static library that I guess you have. Then the only difference it will create is that the static library functionality will be part of you executable no matter if you use it or not.
I don't think it will hurt you in terms of speed but yes it will occupy a lot more space since a copy of lib will be created with you executable.

Do dynamic libraries break C++ standard?

The C++ standard 3.6.3 states
Destructors for initialized objects of static duration are called as a result of returning from main and as a result of calling exit
On windows you have FreeLibrary and linux you have dlclose to unload a dynamically linked library. And you can call these functions before returning from main.
A side effect of unloading a shared library is that all destructors for static objects defined in the library are run.
Does this mean it violates the C++ standard as these destructors have been run prematurely ?

It's a meaningless question. The C++ standard doesn't say what dlclose does or should do.
If the standard were to include a specification for dlclose, it would certainly point out that dlclose is an exception to 3.6.3. So then 3.6.3 wouldn't be violated because it would be a documented exception. But we can't know that, since it doesn't cover it.
What effect dlclose has on the guarantees in the C++ standard is simply outside the scope of that standard. Nothing dlclose can do can violate the C++ standard because the standard says nothing about it.
(If this were to happen without the program doing anything specific to invoke it, then you would have a reasonable argument that the standard is being violated.)

Parapura, it may be helpful to keep in mind that the C++ standard is a language definition that imposes constraints on how the compiler converts source code into object code.
The standard does not impose constraints on the operating system, hardware, or anything else.
If a user powers off his machine, is that a violation of the C++ standard? Of course not. Does the standard need to say "unless the user powers off the device" as an "exception" to every rule? That would be silly.
Similarly, if an operating system kills a process or forces the freeing of some system resources, or even allows a third party program to clobber your data structures -- this is not a violation of the C++ standard. It may well be a bug in the OS, but the C++ language definition remains intact.
The standard is only binding on compilers, and forces the resulting executable code to have certain properties. Nevertheless, it does not bind runtime behavior, which is why we spend so much time on exception handling.

I'm taking this to be a bit of an open-ended question.
I'd say it's like this: The standard only defines what a program is. And a program (a "hosted" one, I should add) is a collection of compiled and linked translation units that has a unique main entry point.
A shared library has no such thing, so it doesn't even constitute a "program" in the sense of the standard. It's just a bunch of linked executable code without any sort of "flow". If you use load-time linking, the library becomes part of the program, and all is as expected. But if you use runtime linking, the situation is different.
Therefore, you may like to view it like this: global variables in the runtime-linked shared object are essentially dynamic objects which are constructed by the dynamic loader, and which are destroyed when the library is unloaded. The fact that those objects are declared like global objects doesn't change that, since the objects aren't part of a "program" at that point.

They are only run prematurely if you go to great effort to do so - the default behavior is standard conforming.

If it does violate the standard, who is the violator? The C++ compiler cannot be considered the violator (since things are being loaded dynamically via a library call); thus it must the the vendor of the dynamic loading functionality, aka the OS vendor. Are OS vendors bound by the C++ standard when designing their systems? That definitely seems to be outside of the scope of the standard.

Or for another perspective, consider the library itself to be a separate program providing some sort of service. When this program is terminated (by whatever means the library is unloaded) then all associated service objects should disappear as well, static or not.

This is just one of the tons and tons of platform-specific "extensions" (for a target compiler, architecture, OS, etc) that are available. All of which "violate" the standard in all sorts of ways. But there is only one expected consequence for deviating from standard C++: you aren't portable anymore. (Unless you do a lot of #ifdef or something, but still, that particular code is locked in to that platform).
Since there is currently no standard/cross-platform notion of libraries, if you want the feature, you have to either not use it or re-implement it per-platform. Since similar things are appearing on most platforms, maybe the standard will one day find a clean way to abstract them so that the standard covers them. The advantage will be a cross-platform solution and it will simplify cross platform code.

How to implement monkey patch in C++?

Is it possible to implement monkey patching in C++?
Or any other similar approach to that?
Thanks.

Not portably so, and due to the dangers for larger projects you better have good reason.
The Preprocessor is probably the best candidate, due to it's ignorance of the language itself. It can be used to rename attributes, methods and other symbol names - but the replacement is global at least for a single #include or sequence of code.
I've used that before to beat "library diamonds" into submission - Library A and B both importing an OS library S, but in different ways so that some symbols of S would be identically named but different. (namespaces were out of the question, for they'd have much more far-reaching consequences).
Similary, you can replace symbol names with compatible-but-superior classes.
e.g. in VC, #import generates an import library that uses _bstr_t as type adapter. In one project I've successfully replaced these _bstr_t uses with a compatible-enough class that interoperated better with other code, just be #define'ing _bstr_t as my replacement class for the #import.
Patching the Virtual Method Table - either replacing the entire VMT or individual methods - is somethign else I've come across. It requires good understanding of how your compiler implements VMTs. I wouldn't do that in a real life project, because it depends on compiler internals, and you don't get any warning when thigns have changed. It's a fun exercise to learn about the implementation details of C++, though. One application would be switching at runtime from an initializer/loader stub to a full - or even data-dependent - implementation.
Generating code on the fly is common in certain scenarios, such as forwarding/filtering COM Interface calls or mapping OS Window Handles to library objects. I'm not sure if this is still "monkey-patching", as it isn't really toying with the language itself.

To add to other answers, consider that any function exposed through a shared object or DLL (depending on platform) can be overridden at run-time. Linux provides the LD_PRELOAD environment variable, which can specify a shared object to load after all others, which can be used to override arbitrary function definitions. It's actually about the best way to provide a "mock object" for unit-testing purposes, since it is not really invasive. However, unlike other forms of monkey-patching, be aware that a change like this is global. You can't specify one particular call to be different, without impacting other calls.

Considering the "guerilla third-party library use" aspect of monkey-patching, C++ offers a number of facilities:
const_cast lets you work around zealous const declarations.
#define private public prior to header inclusion lets you access private members.
subclassing and use Parent::protected_field lets you access protected members.
you can redefine a number of things at link time.
If the third party content you're working around is provided already compiled, though, most of the things feasible in dynamic languages isn't as easy, and often isn't possible at all.

I suppose it depends what you want to do. If you've already linked your program, you're gonna have a hard time replacing anything (short of actually changing the instructions in memory, which might be a stretch as well). However, before this happens, there are options. If you have a dynamically linked program, you can alter the way the linker operates (e.g. LD_LIBRARY_PATH environment variable) and have it link something else than the intended library.
Have a look at valgrind for example, which replaces (among alot of other magic stuff it's dealing with) the standard memory allocation mechanisms.

As monkey patching refers to dynamically changing code, I can't imagine how this could be implemented in C++...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What can be changed in .so library without breaking compatibility - c++

For example can I add new function in header file without the need to recompile all programs using that library?

Related

Designing Interfaces in c++

Making objects shared across DLL's

The effect of number of functions in a C++ library

Do dynamic libraries break C++ standard?

How to implement monkey patch in C++?

Categories

Resources