How to implement monkey patch in C++? - c++

Is it possible to implement monkey patching in C++?
Or any other similar approach to that?
Thanks.

Not portably so, and due to the dangers for larger projects you better have good reason.
The Preprocessor is probably the best candidate, due to it's ignorance of the language itself. It can be used to rename attributes, methods and other symbol names - but the replacement is global at least for a single #include or sequence of code.
I've used that before to beat "library diamonds" into submission - Library A and B both importing an OS library S, but in different ways so that some symbols of S would be identically named but different. (namespaces were out of the question, for they'd have much more far-reaching consequences).
Similary, you can replace symbol names with compatible-but-superior classes.
e.g. in VC, #import generates an import library that uses _bstr_t as type adapter. In one project I've successfully replaced these _bstr_t uses with a compatible-enough class that interoperated better with other code, just be #define'ing _bstr_t as my replacement class for the #import.
Patching the Virtual Method Table - either replacing the entire VMT or individual methods - is somethign else I've come across. It requires good understanding of how your compiler implements VMTs. I wouldn't do that in a real life project, because it depends on compiler internals, and you don't get any warning when thigns have changed. It's a fun exercise to learn about the implementation details of C++, though. One application would be switching at runtime from an initializer/loader stub to a full - or even data-dependent - implementation.
Generating code on the fly is common in certain scenarios, such as forwarding/filtering COM Interface calls or mapping OS Window Handles to library objects. I'm not sure if this is still "monkey-patching", as it isn't really toying with the language itself.

To add to other answers, consider that any function exposed through a shared object or DLL (depending on platform) can be overridden at run-time. Linux provides the LD_PRELOAD environment variable, which can specify a shared object to load after all others, which can be used to override arbitrary function definitions. It's actually about the best way to provide a "mock object" for unit-testing purposes, since it is not really invasive. However, unlike other forms of monkey-patching, be aware that a change like this is global. You can't specify one particular call to be different, without impacting other calls.

Considering the "guerilla third-party library use" aspect of monkey-patching, C++ offers a number of facilities:
const_cast lets you work around zealous const declarations.
#define private public prior to header inclusion lets you access private members.
subclassing and use Parent::protected_field lets you access protected members.
you can redefine a number of things at link time.
If the third party content you're working around is provided already compiled, though, most of the things feasible in dynamic languages isn't as easy, and often isn't possible at all.

I suppose it depends what you want to do. If you've already linked your program, you're gonna have a hard time replacing anything (short of actually changing the instructions in memory, which might be a stretch as well). However, before this happens, there are options. If you have a dynamically linked program, you can alter the way the linker operates (e.g. LD_LIBRARY_PATH environment variable) and have it link something else than the intended library.
Have a look at valgrind for example, which replaces (among alot of other magic stuff it's dealing with) the standard memory allocation mechanisms.

As monkey patching refers to dynamically changing code, I can't imagine how this could be implemented in C++...

Related

How to include classes of a DLL?

I'm new to using DLL's in C++ and I am wondering how to sucessfully load and use classes contained in a DLL without getting a "corrupted stack", a "null pointer" or any application crash ^^. This is a bit confusing to me for now.
Even if it might be safer I don't want to use interface class because it seems too complicated to use. I know that I should have the same include in both DLL and application to prevent crashes. That means that my include file should mention members name.
Here is the problem :
I want to distribute my software (DLL + include file) and I don't want
my customers to see the architecture of my project. But the only class
I want to export from my DLL has as member, objects from other
classes. So my class is dependent of other classes and so on in
cascade.
Is it possible to provide only one include file of one class with just useful methods without having risks of crashes ? If no, what solutions might fit my needs ?
Thanks
FIrst of all, if you use the same include in DLL and in the application using it, if both were compiled with the same compiler and compatible settings, and both use the same runtime library (DLL one, not statically linked), you should have none of the problem you fear.
If you want to hide details of your DLL and provide an alterered, simplified header to the DLL users, you work on your own risk. For example, if there are data members missing (even private ones) of if there are virtual functions missing (even private ones), then it might probably crash, or worse, corrupt something without being noticed. So I'd strongly advise you not to consider this approach unless you're very very experienced with the assembler generated by your compiler.
If you want to hide implementation details of your class, the best approach would probably to design your library around the PIMPL idiom.

Designing Interfaces in c++

I am developing an interface that can be used as dynamic loading.Also it should be compiler independent.So i wanted to export Interfaces.
I am facing now the following problems..
Problem 1: The interface functions are taking some custom data types (basically classes or structures) as In\Out parameters.I want to initialise members of these classes with default values using constructors.If i do this it is not possible to load my library dynamically and it becomes compiler dependent. How to solve this.
Problem 2: Some interfaces returns lists(or maps) of element to client.I am using std containers for this purpose.But this also once again compiler dependent(and compiler version also some times).
Thanks.
Code compiled differently can only work together if it adopts the same Application Binary Interface (ABI) for the set of types used for parameters and return value. ABI's are significant at a much deeper level - name mangling, virtual dispatch tables etc., but my point's that if there's one your compilers support allowing calling of functions with simple types, you can at least think about hacking together some support for more complex types like compiler-specific implementations of Standard containers and user-defined types.
You'll have to research what ABI support your compilers provide, and infer what you can about what they'll continue to provide.
If you want to support other types beyond what the relevant ABI standardises, options include:
use simpler types to expose internals of more complex types
pass [const] char* and size_t extracted by my_std_string.data() or &my_std_string[0] and my_std_string.size(), similarly for std::vector
serialise the data and deserialise it using the data structures of the receiver (can be slow)
provide a set of function pointers to simple accessor/mutator functions implemented by the object that created the data type
e.g. the way the classic C qsort function accepts a pointer to an element comparison function
As I usually have a multithreading focus, I'm mostly going to bark about your second problem.
You already realized that passing elements of a container over an API seems to be compiler dependent. It's actually worse: it's header file & C++-library dependent, so at least for Linux you're already stuck with two different sets: libstc++ (originating from gcc) and libcxx (originating from clang).
Because part of the containers is header files and part is library code, getting things ABI-independent is close to impossible.
My bigger worry is that you actually thought of passing container elements around. This is a huge threadsafety issue: the STL containers are not threadsafe - by design.
By passing references over the interface, you are passing "pointers to encapsulated knowledge" around - the users of your API could make assumptions of your internal structures and start modifying the data pointed to. That is usually already really bad in a singlethreaded environment, but gets worse in a multithreaded environment.
Secondly, pointers you provided could get stale, not good either.
Make sure to return copies of your inner knowledge to prevent user modification of your structures.
Passing things const is not enough: const can be cast away and you still expose your innards.
So my suggestion: hide the data types, only pass simple types and/or structs that you fully control (i.e. are not dependent on STL or boost).
Designing an API with the widest ABI compatibility is an extremely complex subject, even more so when C++ is involved instead of C.
Yet there are more theoretical-type issues that aren't really quite as bad as they sound in practice. For example, in theory, calling conventions and structure padding/alignment sound like they could be major headaches. In practice they aren't so much, and you can even resolve such issues in hindsight by specifying additional build instructions to third parties or decorating your SDK functions with macros indicating the appropriate calling convention. By "not so bad" here, I mean that they can trip you up but they won't have you going back to the drawing board and redesigning your entire SDK in response.
The "practical" issues I want to focus on are issues that can have you revisiting the drawing board and redoing the entire SDK. My list is also not exhaustive, but are some of the ones I think you should really keep in mind first.
You can also treat your SDK as consisting of two parts: a dynamically-linked part that actually exports functionality whose implementation is hidden from clients, and a statically (internally) linked convenience library part that adds C++ wrappers on top. If you treat your SDK as having these two distinct parts, you're allowed a lot more liberty in the statically-linked library to use a lot more C++ mechanisms.
So, let's get started with those practical headache inducers:
1. The binary layout of a vtable is not necessarily consistent across compilers.
This is, in my opinion, one of the biggest gotchas. We're usually looking at 2 main ways to access functionality from one module to another at runtime: function pointers (including those provided by dylib symbol lookup) and interfaces containing virtual functions. The latter can be so much more convenient in C++ (both for implementor and client using the interface), yet unfortunately using virtual functions in an API that aims to be binary compatible with the widest range of compilers is like playing minesweeper through a land of gotchas.
I would recommend avoiding virtual functions outright for this purpose unless your team consists of minesweeper experts who know all of these gotchas. It's useful to try to fall in love with C again for those public interface parts and start building a fondness for these kinds of interfaces consisting of function pointers:
struct Interface
{
void* opaque_private_data;
void (*func1)(struct Interface* self, ...);
void (*func2)(struct Interface* self, ...);
void (*func3)(struct Interface* self, ...);
};
These present far fewer gotchas and are nowhere near as fragile against changes (ex: you're perfectly allowed to do things like add more function pointers to the bottom of the structure without affecting ABI).
2. Stub libs for dylib symbol lookup are linker-specific (as are all static libs in general).
This might not seem like a big deal until combined with #1. When you toss out virtual functions for the purpose of exporting interfaces, then the next big temptation is to often export whole classes or select methods through a dylib.
Unfortunately doing this with manual symbol lookup can become very unwieldy very quickly, so the temptation is to often do this automatically by simply linking to the appropriate stub.
Yet this too can become unwieldy when your goal is to support as many compilers/linkers as possible. In such a case, you may have to possess many compilers and build and distribute different stubs for each possibility.
So this can kind of push you into a corner where it's no longer very practical export class definitions anymore. At this point you might simply export free-standing functions with C linkage (to avoid C++ name mangling which is another potential source of headaches).
One of the things that should be obvious already is that we're getting nudged more and more towards favoring a C or C-like API if our goal is universal binary compatibility without opening up too many cans of worms.
3. Different modules have 'different heaps'.
If you allocate memory in one module and try to deallocate it in another, then you're trying to free memory from a mismatching heap and will invoke undefined behavior.
Even in plain old C, it's easy to forget this rule and malloc in one exported function only to return a pointer to it with the expectation that the client accessing the memory from a different module will free it when done. This once again invokes undefined behavior, and we have to export a second function to indirectly free the memory from the same module that allocated it.
This can become a much bigger gotcha in C++ where we often have class templates that have internal linkage that implicitly do memory management. For example, even if we roll our own std::vector-like sequence like List<T>, we can run into a scenario where a client creates a list, passes it to our API by reference where we use functions that can allocate/deallocate memory (like push_back or insert) and butt heads with this mismatching heap/free store issue. So even this hand-rolled container should ensure that it allocates and deallocates memory from the same central location if it's going to be passed around across modules, and placement new will become your friend when implementing such containers.
4. Passing/returning C++ standard objects is not ABI-compatible.
This includes C++ standard containers as you have already guessed. There's no really practical way to ensure that one compiler will use a compatible representation of something like std::vector when including <vector> as another. So passing/returning such standard objects whose representation is outside of your control is generally out of the question if you're targeting wide binary compatibility.
These don't even necessarily have compatible representations within two projects built by the same compiler, as their representations can vary in incompatible ways based on build settings.
This might make you think that you should now roll all kinds of containers by hand, but I would suggest a KISS approach here. If you're returning a variable number of elements as a result from a function, then we don't need a wide range of container types. We only need one dynamic array kind of container, and it doesn't even have to be a growable sequence, just something with proper copy, move, and destruction semantics.
It might seem nicer and could save some cycles if you just returned a set or a map in a function that computes one, but I'd suggest forgetting about returning these more sophisticated structures and convert to/from this basic dynamic array kind of representation. It's rarely the bottleneck you might think it would be to transfer to/from contiguous representations, and if you actually do run into a hotspot as a result of this which you actually gained from a legit profiling session of a real world use case, then you can always add more to your SDK in a very discrete and selective fashion.
You can also always wrap those more sophisticated containers like map into a C-like function pointer interface that treats the handle to the map as opaque, hidden away from clients. For heftier data structures like a binary search tree, paying the cost of one level of indirection is generally very negligible (for simpler structures like a random-access contiguous sequence, it generally isn't quite as negligible, especially if your read operations like operator[] involve indirect calls).
Another thing worth noting is that everything I've discussed so far relates to the exported, dynamically-linked side of your SDK. The static convenience library that is internally linked is free to receive and return standard objects to make things convenient for the third party using your library, provided that you're not actually passing/returning them in your exported interfaces. You can even avoid rolling your own containers outright and just take a C-style mindset to your exported interfaces, returning raw pointers to T* that needs to be freed while your convenience library does that automatically and transfers the contents to std::vector<T>, e.g.
5. Throwing exceptions across module boundaries is undefined.
We should generally not be throwing exceptions from one module to be caught in another when we cannot ensure compatible build settings in the two modules, let alone the same compiler. So throwing exceptions from your API to indicate input errors is generally out of the question in this case.
Instead we should catch all possible exceptions at the entry points to our module to avoid leaking them into the outside world, and translate all such exceptions into error codes.
The statically-linked convenience library can still call one of your exported functions, check the error code, and in the case of failure, throw an exception. This is perfectly fine here since that convenience library is internally linked to the module of the third party using this library, so it's effectively throwing the exception from the third party module to be caught by the same third party module.
Conclusion
While this is, by no means, an exhaustive list, these are some caveats that can, when unheeded, cause some of the biggest issues at the broadest level of your API design. These kinds of design-level issues can be exponentially more expensive to fix in hindsight than implementation-type issues, so they should generally have the highest priority.
If you're new to these subjects, you can't go too far wrong favoring a C or very C-like API. You can still use a lot of C++ implementing it and can also build a C++ convenience library back on top (your clients don't even have to use anything but the C++ interfaces provided by that internally-linked convenience library).
With C, you're typically looking at more work at the baseline level, but potentially far fewer of those disastrous design-level gotchas. With C++, you're looking at less work at the baseline level, but far more potentially disastrous surprise scenarios. If you favor the latter route, you generally want to ensure that your team's expertise with ABI issues is higher with a larger coding standards document dedicating large sections to these potential ABI gotchas.
For your specific questions:
Problem 1: The interface functions are taking some custom data types
(basically classes or structures) as In\Out parameters.I want to
initialise members of these classes with default values using
constructors.If i do this it is not possible to load my library
dynamically and it becomes compiler dependent. How to solve this.
This is where that statically-linked convenience library can come in handy. You can statically link all that convenient code like a class with constructors and still pass in its data in a more raw, primitive kind of form to the exported interfaces. Another option is to selectively inline or statically link the constructor so that its code is not exported as with the rest of the class, but you probably don't want to be exporting classes as indicated above if your goal is max binary compatibility and don't want too many gotchas.
Problem 2: Some interfaces returns lists(or maps) of element to
client.I am using std containers for this purpose.But this also once
again compiler dependent(and compiler version also some times).
Here we have to forgo those standard container goodies at least at the exported API level. You can still utilize them at the convenience library level which has internal linkage.

Change the logic of application at run-time

I was wondering if it is possible to change the logic of an application at runtime? Meybe we could replace the implementation of an abstract class with another implementation? Or maybe we could replace a shared library at runtime...
update: Suppose that I've got two implementations of function foo(x, y) and can use any of them based on strategy pattern. Now I want to know if it's possible to add a third implementation of foo(x, y) without restarting the application.
You can use a plugin (a library that you will load at runtime) that expose a new foo function.
I remember we implemented something similar at school, a calculator in which we could add new operations at runtime, without having to restart the program. See dlsym and dlopen.
Addenda
Be very careful when dlclose-ing a plugin that it is not still used in some active call stack frame. On Linux you can call many thousands of times dlopen (so you could accept not dlclose-ing plugins, with some address space leak).
Exactly, as you said "replace the implementation of an abstract class with another implementation" if by it you mean, you can use runtime polymorphism and change the instances of concrete classes with instances of another set of concrete classes.
More specifically, there is a well-known pattern called Strategy pattern exactly for this purpose. Have a look at the wiki page, as it explains this very nicely, even with a code example along with diagram.
C++ mechanism of virtual functions does not allow you to change the implementation at run-time.
However, you can implement whatever implementation change at runtime with function pointers.
Here is an article on self-modifying code that I read recently: http://mainisusuallyafunction.blogspot.com/2011/11/self-modifying-code-for-debug-tracing.html

Getting Loki Singleton to work in DLLs in VS 2008 C++

I'm pretty sure this problem isn't new, and pretty sure it's hard to solve. Hopefully I'm wrong about the latter.
I'm trying to use the Loki::Singleton from Modern C++ Design in a program of mine.
However, I can't seem to get it to work across DLLs. I think I know why this is happening: the templated code gets instantiated in every source module, so instead of there being one global variable, each module has its own.
Obviously, this makes the Singleton very much non-single.
Is there any way to get around this behavior?
I see in the Loki source directory that they have a specific SingletonDLL directory under test, looks like they use an exported, explicitly instantiated template (which would work). Hopefully that contains the code you want.
Note this is not going to address the question. An explicitly instantiated and exported singleton should do the trick...
-Rick
Check out #pragma data_seg here basically, you need to declare an instance of the singleton in a shared section of your code. By default statics are scoped to the dll.
It may get tricky with templates, but this is the path to success here that doesn't involve passing / copying static data around.
You are probably correct that each DLL has its own instance of the singleton. I'm not that familiar with the Loki implementation and the source code isn't a lot of fun to figure out.
Possible solutions are:
Not using singletons. This is actually my usual preference because you can avoid whole classes of issues by changing your design to not need them. For a long rant on why Singletons may be harmful see this Yegge post. I'm not THAT vehemently against them but 95% of the time Singletons cause many more problems than they solve (if they actually solve any)
Copying static members across DLL boundaries. I've also done this as a hack, where the DLL gets a pointer from an application or another DLL, and resets its own copy of the static class member to the pointer passed from outside. It's evil, it's dirty, you can't clean up after it, but it does work.

Is it correct to export data members? (C++)

As the title suggests, is it correct or valid to import/export static data from within a C++ class?
I found out my problem - the author of the class I was looking at was trying to export writable static data which isn't supported on this platform.
Many thanks for the responses however.
An exported C++ class means that the DLL clients have to use the same compiler as the DLL because of name mangling and other issues. This is actually a pretty big problem, I once had to write C wrappers to a bunch of C++ classes because the client programs had switched to MSVC9, while the DLL itself was using MSVC71. [There were some other complications with switching the DLL to MSVC90]. Since then I've been quite skeptical about this business of exporting classes, and prefer to write a C wrapper for everything.
Now, if you're willing to pay the price of exporting classes, I'd say that exporting static data doesn't make the problem any worse. Arguably, among the kinds of things you could export, it is safest to export static constants. Even so, I'd rather not do it, because like Timo says, you're now locked into this implementation.
One of the frameworks that I worked on required its clients to provide a set of error code constants. Over time, we found that using a simple bunch of constants was too brittle, and we switched to an OO design. We had a default implementation that would return the common error codes, but each of the error codes was accessed using a virtual function which could be overridden by individual clients - and they used it from some advanced device specific error handling. This solution proved far more scalable than the one based on exporting constants.
I'd suggest that you think long and hard about how you expect the component to evolve before you exporting static variables.
Is it correct inasmuch as it'll work and do what you expect it to? Assuming that you are talking about using _declspec(dllexport/dllimport) on a class or class member, yes, you can do that and it should give you the expected result - the static data would be accessible outside your dll and other C++ code could access it provided that that C++ access specification (public/protected/private) wouldn't block outside access in the first place.
Is it a good idea? Personally I don't think so as you would be exposing class internals not only within your library but to the outside world, which means that it will be pretty much impossible to change what is an implementation detail at the end of the day. Ask yourself if you are 100% certain if the interface of this class and large parts of its implementation will never, ever change...
dllexport (or import) on a class's (non-static) data member does nothing. Exported "things" are either functions or global data (though this is a questionable design choice). dllexport on a class is just a shortcut for saying "export all of these functions".