Writing dynamically loadable components in c++ - c++

I'm currently working on a program which should perform calculations on a home brewed data structure.
I want to build it in a way that it would be easy to add supported calculations (say, as source files which conform to a predetermined structure).
The problem is that I don't want to load all calculations in advance, because there might be a lot of them.
The only mechanism I found which supports dynamic loading of functionality is dlopen, which expects .so files, so in this context, using dlopen means compiling a separate so file for every group of computations.
While I don't see any inherent problem with this design, my spider senses tell me I should verify with the all-knowing-web that it's not utterly stupid. If there are any other suggested ways to do so I'd be glad to hear.

Using dlopen() is the most widely used way to load executable code dynamically in an application on POSIX-compatible operating systems. It allows using a modular architecture where optional or rarely used code is only loaded on-demand, which sounds pretty much like what you need.
I would certainly use this method - if after some time you find that the shared object compilation step is becoming a hurdle, you can build additional dynamically loaded modules to support e.g. an interpretted language such as Lua or Python. This would allow you to keep your existing codebase without losing in extensibility.

Seems like a good approach.
A good way to do this is to declare an abstract (pure) class in C++, say Calculator, with all the methods and accessors you need to perform a calculation. Then, have your separate dynamic libraries or .so files implement a global function Calculator * create_calculator() that creates an instance of a class that derives from Calculator. Finally, you'll have to devise a registration mechanism so that your main program can determine the name of the dynamic library to load, based on some kind of identifier like a string , enum, or uuid. This would typically be available as a easily editable configuration file.
void *handle;
int (*create_calculator)();
/* open the needed object file */
char *libName = get_lib_name_from_config(identifier);
handle = dlopen(libName, RTLD_LOCAL | RTLD_LAZY);
/* find the address of create_calculator function */
create_calculator = (*(Calculator*)()) dlsym(handle, "create_calculator");
Calculator * calc = create_calculator();
This scheme can be made more flexible (and complex) by allowing the create_calculator method name to vary, at the cost of having to obtain that from the config file as well.

Opening shared libraries using dlopen() is certainly the first thing that comes to my mind; it's a fine plan.

Related

Writing functions to binary files?

Something that made me fairly curious was that since it's possible in C++ to pass a function as an argument under the right circumstances, that would suggest that whatever internal code handles that function can be pointed to and otherwise written and read into a binary as its executable code.
This is obviously coming from someone while I may have a strong background in C++, I'm not familiar with the intricate internals in just how memory is managed in the heap and especially how the executable machine code fits into the picture.
I assume since it's possible to pass around the reference to a function, it's possible to get the data pointed to by it and write it somewhere. I don't know.
Anyone want to tell me if this is possible? If so, can you give an example? If not, please tell me why! I love learning more in-depth about how C++ actually works internally.
20 years ago your suggestions could be fresh and usable. People were saving memory by loading code from file on demand , then calling it, then unloading. That was called overlays. To certain level it IS usable, but in form that is standardized in platform and platform's API is what manages it.
Mechanism behind shared libraries (.so in POSIX system, .dll in Windows) is that library's file contains labels where certain functions are , what their name is, as well as data about how stack and data segment should be initialized. It can be done by system automatically, when program is loaded. Otherwise you can load library manually and load pointer to function. E.g. on Windows that would be by LoadLibrary() and GetProcAddress(), dlopen() and dlsym() on Linux.
Reason why it isn't possible now in high level language: security, protection from malicious code in data segment. Run-time library usually handles it.
It is still possible using assembler, but you will challenge antivirus and system security measures. With careful programming you may create own "linker" be able to create your own library and load , I suppose.
No, you can't really do this. There are a whole lot of reasons, but here's a simple and intuitive one: functions may call other functions. If you were able to write a function to disk, and restore it, this would not account for its dependencies (functions it calls, global variables it updates, etc.). It won't work.
If you want to read functions from disk, it is better to express them in a scripting language like Lua. This is a proven solution which is used in many commercial products such as video games and Adobe Lightroom.
While a function pointer is the entry point of a function, and that memory can be read and therefore copied, the first problem is that there is no reliable means of determining the length of that code, so you cannot determine for certain how much to copy to get the entire function and only that.
The other issue is to what practical end? Depending on the platform the code may not be relocatable and will have links to other code. The binary contains no symbolic information; the best you can do is disassemble it, but out of the context of the entire linked executable it may not be very useful to do so.
If your aim is to separate functions from the primary executable, and to be able to later load and run them, then that is what DLLs and shared libraries are for.
If you just want to observe the binary relating to a function, then that is best done in a debugger - it will have a disassembly view mode that will show the raw binary (in hexadecimal), assembly code with symbolic links and the corresponding C source. This makes a lot more sense if your aim is merely to investigate how source code relates to binary machine code.
Below is how you could possibly do what you are asking - even if there is no practical reason for doing it. It makes assumptions about the behaviour of the compiler that may not be valid in some cases. It assumes for example that the compiler will place adjacent functions contiguously in memory and in increasing memory address, so that function2 is immediately after function1 in memory. Here function2 serves only as an end marker for function1 and may be dummy.
int function1()
{
...
return 0 ;
}
void function2()
{
}
#include <stddef.h>
int main()
{
ptrdiff_t function1_length = (char*)function2 - (char*)function1 ;
FILE* fp = fopen( "function1.bin", "wb" ) ;
fwrite( function1, function1_length, 1, fp ) ;
fclose( fp ) ;
}

Precompile script into objects inside C++ application

I need to provide my users the ability to write mathematical computations into the program. I plan to have a simple text interface with a few buttons including those to validate the script grammar, save etc.
Here's where it gets interesting. These functions the user is writing need to execute at multi-megabyte line speeds in a communications application. So I need the speed of a compiled language, but the usage of a script. A fully interpreted language just won't cut it.
My idea is to precompile the saved user modules into objects at initialization of the C++ application. I could then use these objects to execute the code when called upon. Here are the workflows I have in mind:
1) Testing(initial writing) of script: Write code in editor, save, compile into object (testing grammar), run with test I/O, Edit Code
2) Use of Code (Normal operation of application): Load script from file, compile script into object, Run object code, Run object code, Run object code, etc.
I've looked into several off the shelf interpreters, but can't find what I'm looking for. I considered JAVA, as it is pretty fast, but I would need to load the JAVA virtual machine, which means passing objects between C and the virtual machine... The interface is the bottleneck here. I really need to create a native C++ object running C++ code if possible. I also need to be able to run the code on multiple processors effectively in a controlled manner.
I'm not looking for the whole explanation on how to pull this off, as I can do my own research. I've been stalled for a couple days here now, however, and I really need a place to start looking.
As a last resort, I will create my own scripting language to fulfill the need, but that seems a waste with all the great interpreters out there. I've also considered taking an existing open source complier and slicing it up for the functionality I need... just not saving the compiled results to disk... I don't know. I would prefer to use a mainline language if possible... but that's not required.
Any help would be appreciated. I know this is not your run of the mill idea I have here, but someone has to have done it before.
Thanks!
P.S.
One thought that just occurred to me while writing this was this: what about using a true C compiler to create object code, save it to disk as a dll library, then reload and run it inside "my" code? Can you do that with MS Visual Studio? I need to look at the licensing of the compiler... how to reload the library dynamically while the main application continues to run... hmmmmm I could then just group the "functions" created by the user into library groups. Ok that's enough of this particular brain dump...
A possible solution could be use gcc (MingW since you are on windows) and build a DLL out of your user defined code. The DLL should export just one function. You can use the win32 API to handle the DLL (LoadLibrary/GetProcAddress etc.) At the end of this job you have a C style function pointer. The problem now are arguments. If your computation has just one parameter you can fo a cast to double (*funct)(double), but if you have many parameters you need to match them.
I think I've found a way to do this using standard C.
1) Standard C needs to be used because when it is compiled into a dll, the resulting interface is cross compatible with multiple compilers. I plan to do my primary development with MS Visual Studio and compile objects in my application using gcc (windows version)
2) I will expose certain variables to the user (inputs and outputs) and standardize them across units. This allows multiple units to be developed with the same interface.
3) The user will only create the inside of the function using standard C syntax and grammar. I will then wrap that function with text to fully define the function and it's environment (remember those variables I intend to expose?) I can also group multiple functions under a single executable unit (dll) using name parameters.
4) When the user wishes to test their function, I dump the dll from memory, compile their code with my wrappers in gcc, and then reload the dll into memory and run it. I would let them define inputs and outputs for testing.
5) Once the test/create step was complete, I have a compiled library created which can be loaded at run time and handled via pointers. The inputs and outputs would be standardized, so I would always know what my I/O was.
6) The only problem with standardized I/O is that some of the inputs and outputs are likely to not be used. I need to see if I can put default values in or something.
So, to sum up:
Think of an app with a text box and a few buttons. You are told that your inputs are named A, B, and C and that your outputs are X, Y, and Z of specified types. You then write a function using standard C code, and with functions from the specified libraries (I'm thinking math etc.)
So now your done... you see a few boxes below to define your input. You fill them in and hit the TEST button. This would wrap your code in a function context, dump the existing dll from memory (if it exists) and compile your code along with any other functions in the same group (another parameter you could define, basically just a name to the user.) It then runs the function using a functional pointer, using the inputs defined in the UI. The outputs are sent to the user so they can determine if their function works. If there are any compilation errors, that would also be outputted to the user.
Now it's time to run for real. Of course I kept track of what functions are where, so I dynamically open the dll, and load all the functions into memory with functional pointers. I start shoving data into one side and the functions give me the answers I need. There would be some overhead to track I/O and to make sure the functions are called in the right order, but the execution would be at compiled machine code speeds... which is my primary requirement.
Now... I have explained what I think will work in two different ways. Can you think of anything that would keep this from working, or perhaps any advice/gotchas/lessons learned that would help me out? Anything from the type of interface to tips on dynamically loading dll's in this manner to using the gcc compiler this way... etc would be most helpful.
Thanks!

Virtual Files for dynamic linking

my problem is pretty complicated and potentially impossible but here we go:
Using C++,
I'm currently working on an universal server engine for a game project of mine. Universal, because every part of the engine will be loaded dynamically after startup. Now, also game objects will inherit from a base object and have overloaded "Simulate" functions. In that way, every object would have it's specific behavior and I can do something I call "C++ Scripting" which is alot faster than interpreted lua script files. Also it's more dynamic.
(Please no solutions which would kill the c++ "scripting" part, like "forget the dynamic linking, that's insane". This performance boost is totally necessary, since I'm working with large voxel maps)
My Problem:
That are indeed alot of .dll/.so files and I wanted to pack those into a simple archive so I can use zlib on said source code and maybe pack everything together with textures and sounds in little "object packages".
Now the Windows DLL API and the Linux SO API won't allow me to load a dll/so file from a memory address, which is a shame.(Am I right there, or can I bypass that? :) ) I don't want to unzip and temp save those files on the filesystem because there are hundreds to thousands of them and that would increase the loading time alot.
Also I'm not interested in more external dependencies like boost.
So here are my Questions:
Is there a cross platform-method to create virtual files IN memory with a real path?
That way I could bypass the slow IO speeds of HDDs.
Or is it really not such a big deal to use temp files, because the file buffers of modern operating systems are fast/intelligent enough to NOT write all those files to disc?
(Actually Linux supports virtual file systems, but windows does not...)
I hope you guys can help me there :)
Not with winapi, that's for sure, but you can do it manually. You can load it into the memory, fill it's import table and call exported functions (after you called DllMain). I saw a program, where someone actually created a new process with that method ... See the PE documentation for details, but it works.
Also it's relatively easy to do, since you only need to find the PE import tables, and do what the dynamic linker does, fill it with jumps and addresses. Dlls contains position independent code, so no relocation needed.
It sould be the same on linux (only using the elf structure), but if you have a better solution with virtual file systems, you should use that.

Benefits of exporting a class from a dll vs. static library

I have a C++ class I'm writing now that will be used all over a project I'm working on. I have the option to put it in a static library, or export the class from a dll. What are the benefits/penalties for each approach. The only one I can think of is compiled code size which I don't really care about. Thanks!
Advantages of a DLL:
You can have multiple different exe's that access this functionality, so you will have a smaller project size overall.
You can dynamically update your component without replacing the whole exe. If you do this though be careful that the interface remains the same.
Sometimes like in the case of LGPL you are forced into using a DLL.
You could have some components as C#, Python or other languages that tie into your DLL.
You can build programs that consume your DLL that work with different versions of the DLL. For example you could check if a function exists in a certain operating system DLL and only call it if it exists, and otherwise do some other processing.
Advantages of Static library:
You cannot have dll verisoning problems that way
Less to distribute, you aren't forced into a full installer if you only have a small application.
You don't have to worry about anyone else tying into your code that would have been accessible if it was a DLL.
Easier to develop a static library as you don't need to worry about exports and imports.
Memory management is easier.
One of the most significant and often unnoted features of dynamic libraries on Windows is that DLLs have their own heap. This can be an advantage or a disadvantage depending on your point of view but you need to be aware of it. For example, a global variable in a DLL will be shared among all the processes attaching to that library which can be a useful form of de facto interprocess communication or the source of an obscure run time error.

How to implement monkey patch in C++?

Is it possible to implement monkey patching in C++?
Or any other similar approach to that?
Thanks.
Not portably so, and due to the dangers for larger projects you better have good reason.
The Preprocessor is probably the best candidate, due to it's ignorance of the language itself. It can be used to rename attributes, methods and other symbol names - but the replacement is global at least for a single #include or sequence of code.
I've used that before to beat "library diamonds" into submission - Library A and B both importing an OS library S, but in different ways so that some symbols of S would be identically named but different. (namespaces were out of the question, for they'd have much more far-reaching consequences).
Similary, you can replace symbol names with compatible-but-superior classes.
e.g. in VC, #import generates an import library that uses _bstr_t as type adapter. In one project I've successfully replaced these _bstr_t uses with a compatible-enough class that interoperated better with other code, just be #define'ing _bstr_t as my replacement class for the #import.
Patching the Virtual Method Table - either replacing the entire VMT or individual methods - is somethign else I've come across. It requires good understanding of how your compiler implements VMTs. I wouldn't do that in a real life project, because it depends on compiler internals, and you don't get any warning when thigns have changed. It's a fun exercise to learn about the implementation details of C++, though. One application would be switching at runtime from an initializer/loader stub to a full - or even data-dependent - implementation.
Generating code on the fly is common in certain scenarios, such as forwarding/filtering COM Interface calls or mapping OS Window Handles to library objects. I'm not sure if this is still "monkey-patching", as it isn't really toying with the language itself.
To add to other answers, consider that any function exposed through a shared object or DLL (depending on platform) can be overridden at run-time. Linux provides the LD_PRELOAD environment variable, which can specify a shared object to load after all others, which can be used to override arbitrary function definitions. It's actually about the best way to provide a "mock object" for unit-testing purposes, since it is not really invasive. However, unlike other forms of monkey-patching, be aware that a change like this is global. You can't specify one particular call to be different, without impacting other calls.
Considering the "guerilla third-party library use" aspect of monkey-patching, C++ offers a number of facilities:
const_cast lets you work around zealous const declarations.
#define private public prior to header inclusion lets you access private members.
subclassing and use Parent::protected_field lets you access protected members.
you can redefine a number of things at link time.
If the third party content you're working around is provided already compiled, though, most of the things feasible in dynamic languages isn't as easy, and often isn't possible at all.
I suppose it depends what you want to do. If you've already linked your program, you're gonna have a hard time replacing anything (short of actually changing the instructions in memory, which might be a stretch as well). However, before this happens, there are options. If you have a dynamically linked program, you can alter the way the linker operates (e.g. LD_LIBRARY_PATH environment variable) and have it link something else than the intended library.
Have a look at valgrind for example, which replaces (among alot of other magic stuff it's dealing with) the standard memory allocation mechanisms.
As monkey patching refers to dynamically changing code, I can't imagine how this could be implemented in C++...