Making objects shared across DLL's - c++

I wrote a modular system with DLL's (or .so's, it's cross-platform). These separate plugins have to hook into the main system's objects.
E.G.
You have a vector of strings in one object, the owner is the main application. Now several (explicit loaded) plugins must access that vector.
These plugins need also a baseclass, a sort of API to hook into the main system.
I tried including all the header files into the DLL, but there are two problems with that: the I include the API each time I build an plugin.
The second is: for static methods, I need a cpp file, but then he doesn't use the cpp file from the main application, but his own. This causes problem with that shared vector... (since there are now multiple vectors, one main, and one per plugin)
Any ideas?

This is a bit problematic as Paul mentioned. When accessing a vector (probably you're talking about std::vector) which was instantiated in another module (exe/dll), you have to be careful. If both modules are compiled with the same compiler, same compiler settings, same STL implementation and you're only reading from it (especially not pushing additional values, as that may reallocate the internal buffer and cause problems when accessing the memory), you should be fine.
However if you cannot guarantee such setup, it would be better to provide access to raw data array together with its size, not via std::vector directly, but rather via the std::vector::data(). If the stored data type (template type T of std::vector) is binary compatible between the two communicating modules, such interface should be safe.
There is a C++ standard proposal to fix such issues: Defining a Portable C++ ABI. Hopefully it will prevent such problems in the future.

Related

What can be changed in .so library without breaking compatibility

For example can I add new function in header file without the need to recompile all programs using that library?
You can add functions and objects to a shared library without breaking existing programs that rely on that library. Under some circumstances you could increase the size of objects (in particular, arrays) in the library.
You can also replace the implementation of a function, provided that the function signature does not change. This will not cause any problems with dynamic linking, but if the new implementation's behavior does not meet existing programs' expectations then you'll see program misbehavior.
You can remove functions and objects that no program links to. If you're concerned only with existing programs then you may be able to catalog what functions and objects those are, but otherwise you can only base such an evaluation on the visibility of the functions / objects within the shared library -- externally-visible functions and objects cannot safely be removed.
There may be other implementation-specific details of a shared library that can be changed without breaking compatibility as well.
Note, however, that none of that has anything directly to do with header files. Compatibility of shared libraries is primarily a run time consideration. Header files are relevant only at compile time.
Another point is that you have to be very careful of any shared structures. If a function in your library accepts or returns a structure or a pointer to a structure, and if you make any changes to that structure (adding, removing, or rearranging members), you're likely to introduce incompatibility.
(Strictly speaking, changes like this do count as changes to the function's signature, as mentioned by others.)
If you're very, very careful, you can arrange to add new members at the end of a structure, but it generally requires explicit cooperation by callers, using mechanisms defined in advance (that is, adhered to since version 0, by all calling code, before any changes get made).

Designing Interfaces in c++

I am developing an interface that can be used as dynamic loading.Also it should be compiler independent.So i wanted to export Interfaces.
I am facing now the following problems..
Problem 1: The interface functions are taking some custom data types (basically classes or structures) as In\Out parameters.I want to initialise members of these classes with default values using constructors.If i do this it is not possible to load my library dynamically and it becomes compiler dependent. How to solve this.
Problem 2: Some interfaces returns lists(or maps) of element to client.I am using std containers for this purpose.But this also once again compiler dependent(and compiler version also some times).
Thanks.
Code compiled differently can only work together if it adopts the same Application Binary Interface (ABI) for the set of types used for parameters and return value. ABI's are significant at a much deeper level - name mangling, virtual dispatch tables etc., but my point's that if there's one your compilers support allowing calling of functions with simple types, you can at least think about hacking together some support for more complex types like compiler-specific implementations of Standard containers and user-defined types.
You'll have to research what ABI support your compilers provide, and infer what you can about what they'll continue to provide.
If you want to support other types beyond what the relevant ABI standardises, options include:
use simpler types to expose internals of more complex types
pass [const] char* and size_t extracted by my_std_string.data() or &my_std_string[0] and my_std_string.size(), similarly for std::vector
serialise the data and deserialise it using the data structures of the receiver (can be slow)
provide a set of function pointers to simple accessor/mutator functions implemented by the object that created the data type
e.g. the way the classic C qsort function accepts a pointer to an element comparison function
As I usually have a multithreading focus, I'm mostly going to bark about your second problem.
You already realized that passing elements of a container over an API seems to be compiler dependent. It's actually worse: it's header file & C++-library dependent, so at least for Linux you're already stuck with two different sets: libstc++ (originating from gcc) and libcxx (originating from clang).
Because part of the containers is header files and part is library code, getting things ABI-independent is close to impossible.
My bigger worry is that you actually thought of passing container elements around. This is a huge threadsafety issue: the STL containers are not threadsafe - by design.
By passing references over the interface, you are passing "pointers to encapsulated knowledge" around - the users of your API could make assumptions of your internal structures and start modifying the data pointed to. That is usually already really bad in a singlethreaded environment, but gets worse in a multithreaded environment.
Secondly, pointers you provided could get stale, not good either.
Make sure to return copies of your inner knowledge to prevent user modification of your structures.
Passing things const is not enough: const can be cast away and you still expose your innards.
So my suggestion: hide the data types, only pass simple types and/or structs that you fully control (i.e. are not dependent on STL or boost).
Designing an API with the widest ABI compatibility is an extremely complex subject, even more so when C++ is involved instead of C.
Yet there are more theoretical-type issues that aren't really quite as bad as they sound in practice. For example, in theory, calling conventions and structure padding/alignment sound like they could be major headaches. In practice they aren't so much, and you can even resolve such issues in hindsight by specifying additional build instructions to third parties or decorating your SDK functions with macros indicating the appropriate calling convention. By "not so bad" here, I mean that they can trip you up but they won't have you going back to the drawing board and redesigning your entire SDK in response.
The "practical" issues I want to focus on are issues that can have you revisiting the drawing board and redoing the entire SDK. My list is also not exhaustive, but are some of the ones I think you should really keep in mind first.
You can also treat your SDK as consisting of two parts: a dynamically-linked part that actually exports functionality whose implementation is hidden from clients, and a statically (internally) linked convenience library part that adds C++ wrappers on top. If you treat your SDK as having these two distinct parts, you're allowed a lot more liberty in the statically-linked library to use a lot more C++ mechanisms.
So, let's get started with those practical headache inducers:
1. The binary layout of a vtable is not necessarily consistent across compilers.
This is, in my opinion, one of the biggest gotchas. We're usually looking at 2 main ways to access functionality from one module to another at runtime: function pointers (including those provided by dylib symbol lookup) and interfaces containing virtual functions. The latter can be so much more convenient in C++ (both for implementor and client using the interface), yet unfortunately using virtual functions in an API that aims to be binary compatible with the widest range of compilers is like playing minesweeper through a land of gotchas.
I would recommend avoiding virtual functions outright for this purpose unless your team consists of minesweeper experts who know all of these gotchas. It's useful to try to fall in love with C again for those public interface parts and start building a fondness for these kinds of interfaces consisting of function pointers:
struct Interface
{
void* opaque_private_data;
void (*func1)(struct Interface* self, ...);
void (*func2)(struct Interface* self, ...);
void (*func3)(struct Interface* self, ...);
};
These present far fewer gotchas and are nowhere near as fragile against changes (ex: you're perfectly allowed to do things like add more function pointers to the bottom of the structure without affecting ABI).
2. Stub libs for dylib symbol lookup are linker-specific (as are all static libs in general).
This might not seem like a big deal until combined with #1. When you toss out virtual functions for the purpose of exporting interfaces, then the next big temptation is to often export whole classes or select methods through a dylib.
Unfortunately doing this with manual symbol lookup can become very unwieldy very quickly, so the temptation is to often do this automatically by simply linking to the appropriate stub.
Yet this too can become unwieldy when your goal is to support as many compilers/linkers as possible. In such a case, you may have to possess many compilers and build and distribute different stubs for each possibility.
So this can kind of push you into a corner where it's no longer very practical export class definitions anymore. At this point you might simply export free-standing functions with C linkage (to avoid C++ name mangling which is another potential source of headaches).
One of the things that should be obvious already is that we're getting nudged more and more towards favoring a C or C-like API if our goal is universal binary compatibility without opening up too many cans of worms.
3. Different modules have 'different heaps'.
If you allocate memory in one module and try to deallocate it in another, then you're trying to free memory from a mismatching heap and will invoke undefined behavior.
Even in plain old C, it's easy to forget this rule and malloc in one exported function only to return a pointer to it with the expectation that the client accessing the memory from a different module will free it when done. This once again invokes undefined behavior, and we have to export a second function to indirectly free the memory from the same module that allocated it.
This can become a much bigger gotcha in C++ where we often have class templates that have internal linkage that implicitly do memory management. For example, even if we roll our own std::vector-like sequence like List<T>, we can run into a scenario where a client creates a list, passes it to our API by reference where we use functions that can allocate/deallocate memory (like push_back or insert) and butt heads with this mismatching heap/free store issue. So even this hand-rolled container should ensure that it allocates and deallocates memory from the same central location if it's going to be passed around across modules, and placement new will become your friend when implementing such containers.
4. Passing/returning C++ standard objects is not ABI-compatible.
This includes C++ standard containers as you have already guessed. There's no really practical way to ensure that one compiler will use a compatible representation of something like std::vector when including <vector> as another. So passing/returning such standard objects whose representation is outside of your control is generally out of the question if you're targeting wide binary compatibility.
These don't even necessarily have compatible representations within two projects built by the same compiler, as their representations can vary in incompatible ways based on build settings.
This might make you think that you should now roll all kinds of containers by hand, but I would suggest a KISS approach here. If you're returning a variable number of elements as a result from a function, then we don't need a wide range of container types. We only need one dynamic array kind of container, and it doesn't even have to be a growable sequence, just something with proper copy, move, and destruction semantics.
It might seem nicer and could save some cycles if you just returned a set or a map in a function that computes one, but I'd suggest forgetting about returning these more sophisticated structures and convert to/from this basic dynamic array kind of representation. It's rarely the bottleneck you might think it would be to transfer to/from contiguous representations, and if you actually do run into a hotspot as a result of this which you actually gained from a legit profiling session of a real world use case, then you can always add more to your SDK in a very discrete and selective fashion.
You can also always wrap those more sophisticated containers like map into a C-like function pointer interface that treats the handle to the map as opaque, hidden away from clients. For heftier data structures like a binary search tree, paying the cost of one level of indirection is generally very negligible (for simpler structures like a random-access contiguous sequence, it generally isn't quite as negligible, especially if your read operations like operator[] involve indirect calls).
Another thing worth noting is that everything I've discussed so far relates to the exported, dynamically-linked side of your SDK. The static convenience library that is internally linked is free to receive and return standard objects to make things convenient for the third party using your library, provided that you're not actually passing/returning them in your exported interfaces. You can even avoid rolling your own containers outright and just take a C-style mindset to your exported interfaces, returning raw pointers to T* that needs to be freed while your convenience library does that automatically and transfers the contents to std::vector<T>, e.g.
5. Throwing exceptions across module boundaries is undefined.
We should generally not be throwing exceptions from one module to be caught in another when we cannot ensure compatible build settings in the two modules, let alone the same compiler. So throwing exceptions from your API to indicate input errors is generally out of the question in this case.
Instead we should catch all possible exceptions at the entry points to our module to avoid leaking them into the outside world, and translate all such exceptions into error codes.
The statically-linked convenience library can still call one of your exported functions, check the error code, and in the case of failure, throw an exception. This is perfectly fine here since that convenience library is internally linked to the module of the third party using this library, so it's effectively throwing the exception from the third party module to be caught by the same third party module.
Conclusion
While this is, by no means, an exhaustive list, these are some caveats that can, when unheeded, cause some of the biggest issues at the broadest level of your API design. These kinds of design-level issues can be exponentially more expensive to fix in hindsight than implementation-type issues, so they should generally have the highest priority.
If you're new to these subjects, you can't go too far wrong favoring a C or very C-like API. You can still use a lot of C++ implementing it and can also build a C++ convenience library back on top (your clients don't even have to use anything but the C++ interfaces provided by that internally-linked convenience library).
With C, you're typically looking at more work at the baseline level, but potentially far fewer of those disastrous design-level gotchas. With C++, you're looking at less work at the baseline level, but far more potentially disastrous surprise scenarios. If you favor the latter route, you generally want to ensure that your team's expertise with ABI issues is higher with a larger coding standards document dedicating large sections to these potential ABI gotchas.
For your specific questions:
Problem 1: The interface functions are taking some custom data types
(basically classes or structures) as In\Out parameters.I want to
initialise members of these classes with default values using
constructors.If i do this it is not possible to load my library
dynamically and it becomes compiler dependent. How to solve this.
This is where that statically-linked convenience library can come in handy. You can statically link all that convenient code like a class with constructors and still pass in its data in a more raw, primitive kind of form to the exported interfaces. Another option is to selectively inline or statically link the constructor so that its code is not exported as with the rest of the class, but you probably don't want to be exporting classes as indicated above if your goal is max binary compatibility and don't want too many gotchas.
Problem 2: Some interfaces returns lists(or maps) of element to
client.I am using std containers for this purpose.But this also once
again compiler dependent(and compiler version also some times).
Here we have to forgo those standard container goodies at least at the exported API level. You can still utilize them at the convenience library level which has internal linkage.

Loading COM in a process which is compiled using different visual studio compiler version

We have a executable compiled using Visual Studio 2008 version. Due to the 3rd party dependency we must compile this executable in visual studio 2008.
We also have another component which gets compiled in visual studio 2010. Now we need to get one COM component dll from this component (which is compiled in 2010 compiler version) accessed by the executable which is compiled using 2008 compiler version.
My question here is, would it work fine. Would there be conflicts in the runtime used by the executable (which is 2008 runtime lib) and runtime used by the COM component (which is using 2010 runtime).
We actually tried to load this COM dll in executable which actually worked fine. But I have concern that in later time due to multiple runtimes it may crash/fail.
Please let me know how the multiple runtimes would get handled here. Is it safe to load the different runtime in single executable. Would there be any conflicts in later part of execution due to different runtime available?
Anyway a solution we are looking to solve this problem to make the COM component as a OUT proc Server, which anyway will work. But that will involve a lot of work to do.
Please let me know.
Many Thanks
You should have no problem mixing COM objects that are linked with different runtime libraries, since the memory allocation and deallocation of each object will be done behind the DLL boundary.
You need to be careful that all your methods have proper COM signatures, i.e. all pointers should be COM pointers.
COM is designed for binary interop. By design the framework is implementation agnostic. The intent is that COM servers can be implemented in one language/runtime, and consumed by a COM client implemented with a different language/runtime.
There are absolutely no constraints over the languages and runtimes that are used by different parties.
This has been answered a few times in several contexts.
As long as you don't handle and/or pass around C runtime (CRT) data structures between modules, you're fine. If you do any of the following between modules that depend on different CRTs, you'll have trouble, and in this specific case, you're not implementing COM objects properly:
malloc memory in one module and realloc or free in another
fopen a FILE* in one module and fread, fwrite, fclose, etc. in another
setjmp in one module and longjmp in another
Note that there are things you can do:
Use memory malloced by another module, keeping the responsibility of reallocating and freeing on the originating module
Use some interface that interacts with files fopened by another module, keeping the responsibility of its use on the originating module
Don't use setjmp/longjmp across unrelated or losely coupled modules, define callbacks, aborting error codes, whatever, but don't rely on unwinding techniques, even if provided by the OS
You can see a pattern here. You can use resources from another module for as long as you delegate managing those resources to that module.
With COM, you shouldn't ever have this kind of trouble, everything should be encapsulated in objects through their implemented interfaces. Although you can pass malloced memory as top-level pointer arguments, you're only supposed to access that memory in the callee, never reallocate or free it. For inner pointers, you must use CoTaskMemAlloc and its cousins, as this is the common memory manager in COM. This applies to handling files (e.g. encapsulate them in IStream, IPipeByte, a IEnumByte or something similar), and don't unwind across COM calls.

multiple dlls with multiple classes and global variables in dll

Good time of day, everyone!
I have some questions about .dll programming in C++, it's rather new for me.
1) If I want to create DLL with a multiple classes, but I still want to create abstract interface for each class, should I create one header file for interfaces, or create separate multiple headers for each abstract class? And what should I do with .cpp implementation of factory functions?
2) If I create object and factory function, and gets a pointer to instance, can I just call "delete" in program when I want to free that memory? I think, that object is placed in dll's pages and there may be some problems. What should I do to properly free memory in this case?
3) I read, that if more than one process binds .dll - dll creates separate individual instances of global variables for each project. Does it right? Then I have two questions if it is true:
3.1) What happens with static members in dll? What if I want to create a singleton manager, can I place it in dll?
3.2) If I have Core.dll and Graphics.dll, Sound.dll and Physics.dll. Core.dll has a global variable (or a singleton manager in my real case). Will the other dlls work with one instance of singleton, or other? (each dll uses Core.dll)
I apologize for my weak English and many questions in one topic :)
Thank you for your attention and answers.
1: Mostly this is up to you and depends on the scale of the project. On something small it matters little, so keep it simple and have a single header. On larger projects it is best to reduce unnecessary interdependencies as much as possible - so put them in seperate files. You can alwasy create "all.h" which just includes the other things.
2: Yes, if the DLL and the EXE are both linked to the multithreaded DLL CRT. Unless you know what you are doing, always use this as it is the safest and will do what you expect - it results in the exe and dll(s) being able to share the heap as if they were a single executable. You can "new Object()" in the dll and "delete obj" in the exe freely.
NOTE: Mixing different versions of your EXE and your DLL can introduce incredibly subtle bugs (if, say, a class/struct definition changes), so don't do that.
3: Every process has its own independent memory space (unless you specifically do certain things to try to get some shared memory). Processes are not allowed to get to memory of other processes.
3.1: I strongly recommend you avoid global state. (Global static-const is OK). Global variables lead to many unexpected and difficult problems, and globals in Windows DLL have a number of additional complexities. It is far better in the long run for you to have explicit "Initialize/Deinitialze" functions in the DLL that the EXE must call.
But, global statics in a dll are not much different than in an executable ... they get initialized in pretty much the same way when the DLL is loaded. (THings get more complicated when you dynamically load DLLs, but lets ignore that here).
3.2: Yes, they would work with the single instance - but don't do it anyways, you will eventually regret it. Much better to make the initialization explicit because you cannot control the order in which global variables are constructed, and this can quickly lead to very difficult initialization problems.

How to implement monkey patch in C++?

Is it possible to implement monkey patching in C++?
Or any other similar approach to that?
Thanks.
Not portably so, and due to the dangers for larger projects you better have good reason.
The Preprocessor is probably the best candidate, due to it's ignorance of the language itself. It can be used to rename attributes, methods and other symbol names - but the replacement is global at least for a single #include or sequence of code.
I've used that before to beat "library diamonds" into submission - Library A and B both importing an OS library S, but in different ways so that some symbols of S would be identically named but different. (namespaces were out of the question, for they'd have much more far-reaching consequences).
Similary, you can replace symbol names with compatible-but-superior classes.
e.g. in VC, #import generates an import library that uses _bstr_t as type adapter. In one project I've successfully replaced these _bstr_t uses with a compatible-enough class that interoperated better with other code, just be #define'ing _bstr_t as my replacement class for the #import.
Patching the Virtual Method Table - either replacing the entire VMT or individual methods - is somethign else I've come across. It requires good understanding of how your compiler implements VMTs. I wouldn't do that in a real life project, because it depends on compiler internals, and you don't get any warning when thigns have changed. It's a fun exercise to learn about the implementation details of C++, though. One application would be switching at runtime from an initializer/loader stub to a full - or even data-dependent - implementation.
Generating code on the fly is common in certain scenarios, such as forwarding/filtering COM Interface calls or mapping OS Window Handles to library objects. I'm not sure if this is still "monkey-patching", as it isn't really toying with the language itself.
To add to other answers, consider that any function exposed through a shared object or DLL (depending on platform) can be overridden at run-time. Linux provides the LD_PRELOAD environment variable, which can specify a shared object to load after all others, which can be used to override arbitrary function definitions. It's actually about the best way to provide a "mock object" for unit-testing purposes, since it is not really invasive. However, unlike other forms of monkey-patching, be aware that a change like this is global. You can't specify one particular call to be different, without impacting other calls.
Considering the "guerilla third-party library use" aspect of monkey-patching, C++ offers a number of facilities:
const_cast lets you work around zealous const declarations.
#define private public prior to header inclusion lets you access private members.
subclassing and use Parent::protected_field lets you access protected members.
you can redefine a number of things at link time.
If the third party content you're working around is provided already compiled, though, most of the things feasible in dynamic languages isn't as easy, and often isn't possible at all.
I suppose it depends what you want to do. If you've already linked your program, you're gonna have a hard time replacing anything (short of actually changing the instructions in memory, which might be a stretch as well). However, before this happens, there are options. If you have a dynamically linked program, you can alter the way the linker operates (e.g. LD_LIBRARY_PATH environment variable) and have it link something else than the intended library.
Have a look at valgrind for example, which replaces (among alot of other magic stuff it's dealing with) the standard memory allocation mechanisms.
As monkey patching refers to dynamically changing code, I can't imagine how this could be implemented in C++...