Virtual Files for dynamic linking - c++

my problem is pretty complicated and potentially impossible but here we go:
Using C++,
I'm currently working on an universal server engine for a game project of mine. Universal, because every part of the engine will be loaded dynamically after startup. Now, also game objects will inherit from a base object and have overloaded "Simulate" functions. In that way, every object would have it's specific behavior and I can do something I call "C++ Scripting" which is alot faster than interpreted lua script files. Also it's more dynamic.
(Please no solutions which would kill the c++ "scripting" part, like "forget the dynamic linking, that's insane". This performance boost is totally necessary, since I'm working with large voxel maps)
My Problem:
That are indeed alot of .dll/.so files and I wanted to pack those into a simple archive so I can use zlib on said source code and maybe pack everything together with textures and sounds in little "object packages".
Now the Windows DLL API and the Linux SO API won't allow me to load a dll/so file from a memory address, which is a shame.(Am I right there, or can I bypass that? :) ) I don't want to unzip and temp save those files on the filesystem because there are hundreds to thousands of them and that would increase the loading time alot.
Also I'm not interested in more external dependencies like boost.
So here are my Questions:
Is there a cross platform-method to create virtual files IN memory with a real path?
That way I could bypass the slow IO speeds of HDDs.
Or is it really not such a big deal to use temp files, because the file buffers of modern operating systems are fast/intelligent enough to NOT write all those files to disc?
(Actually Linux supports virtual file systems, but windows does not...)
I hope you guys can help me there :)

Not with winapi, that's for sure, but you can do it manually. You can load it into the memory, fill it's import table and call exported functions (after you called DllMain). I saw a program, where someone actually created a new process with that method ... See the PE documentation for details, but it works.
Also it's relatively easy to do, since you only need to find the PE import tables, and do what the dynamic linker does, fill it with jumps and addresses. Dlls contains position independent code, so no relocation needed.
It sould be the same on linux (only using the elf structure), but if you have a better solution with virtual file systems, you should use that.

Related

Override c library file functions?

I am working on a game, and one of the requirements per the licence agreement of the sound assets I am using is that they be distributed in a way that makes them inaccessible to the end user. So, I am thinking about aggregating them into a flat file, encrypting them, or some such. The problem is that the sound library I am using (Hekkus Sound System) only accepts a 'char*' file path and handles file reading internally. So, if I am to continue to use it, I will have to override the c stdio file functions to handle encryption or whatever I decide to do. This seems doable, but it worries me. Looking on the web I am seeing people running into strange frustrating problems doing this on platforms I am concerned with(Win32, Android and iOS).
Does there happen to be a cross-platform library out there that takes care of this? Is there a better approach entirely you would recommend?
Do you have the option of using a named pipe instead of an ordinary file? If so, you can present the pipe to the sound library as the file to read from, and you can decrypt your data and write it to the pipe, no problem. (See Beej's Guide for an explanation of named pipes.)
Override stdio in a way that a lib you not knowing how it works exactly works in a way the developer hasn't in mind do not look like the right approach for me, as it isn't really easy. Implement a ramdrive needs so much effort that I recommend to search for another audio lib.
The Hekkus Sound System I found was build by a single person and last updated 2012. I wouldn't rely on a lib with only one person working on it without sharing the sources.
My advice, invest your time in searching for a proper sound lib instead of searching for a fishy work around for this one.
One possibility is to use a encrypted loopback filesystem (google for additional resources).
The way this works is that you put your assets on a encrypted filesystem, which actually lives in a simple file. This filesystem gets mounted someplace as a loopback device. Password needs to be supplied at attach / mount time. Once mounted, all files are available as regular files to your software. But otherwise, the files are encrypted and inaccessible.
It's compiler-dependent and not a guaranteed feature, but many allow you to embed files/resources directly into the exe and read them in your code as if from disk. You could embed your sound files that way. It will significantly increase the size of your exe however.
Another UNIX-based approach:
The environment variable LD_PRELOAD can be used to override any shared library an executable has been linked against. All symbols exported by a library mentioned in LD_PRELOAD are resolved to that library, including calls to libc functions like open, read, and close. Using the libdl, it is also possible for the wrapping library to call through to the original implementation.
So, all you need to do is to start the process which uses the Hekkus Sound System in an environment that has LD_PRELOAD set appropriately, and you can do anything you like to the file that it reads.
Note, however, that there is absolutely no way that you can keep the data inaccessible from the user: the very fact that he has to be able to hear it means he has to have access. Even if all software in the chain would use encryption, and your user is not willing to hack hardware, it would not be exactly difficult to connect the audio output jack with an audio input jack, would it? And you can't forbid you user to use earphones, can you? And, of course, the kernel can see all audio output unencrypted and can send a copy somewhere else...
The solution to your problem would be a ramdisk.
http://en.wikipedia.org/wiki/RAM_drive
Using a piece of memory in ram as if it was a disk.
There is software available for this too. Caching databases in ram is becoming popular.
And it keeps the file from being on the disk that would make it easy accessible to the user.

Writing dynamically loadable components in c++

I'm currently working on a program which should perform calculations on a home brewed data structure.
I want to build it in a way that it would be easy to add supported calculations (say, as source files which conform to a predetermined structure).
The problem is that I don't want to load all calculations in advance, because there might be a lot of them.
The only mechanism I found which supports dynamic loading of functionality is dlopen, which expects .so files, so in this context, using dlopen means compiling a separate so file for every group of computations.
While I don't see any inherent problem with this design, my spider senses tell me I should verify with the all-knowing-web that it's not utterly stupid. If there are any other suggested ways to do so I'd be glad to hear.
Using dlopen() is the most widely used way to load executable code dynamically in an application on POSIX-compatible operating systems. It allows using a modular architecture where optional or rarely used code is only loaded on-demand, which sounds pretty much like what you need.
I would certainly use this method - if after some time you find that the shared object compilation step is becoming a hurdle, you can build additional dynamically loaded modules to support e.g. an interpretted language such as Lua or Python. This would allow you to keep your existing codebase without losing in extensibility.
Seems like a good approach.
A good way to do this is to declare an abstract (pure) class in C++, say Calculator, with all the methods and accessors you need to perform a calculation. Then, have your separate dynamic libraries or .so files implement a global function Calculator * create_calculator() that creates an instance of a class that derives from Calculator. Finally, you'll have to devise a registration mechanism so that your main program can determine the name of the dynamic library to load, based on some kind of identifier like a string , enum, or uuid. This would typically be available as a easily editable configuration file.
void *handle;
int (*create_calculator)();
/* open the needed object file */
char *libName = get_lib_name_from_config(identifier);
handle = dlopen(libName, RTLD_LOCAL | RTLD_LAZY);
/* find the address of create_calculator function */
create_calculator = (*(Calculator*)()) dlsym(handle, "create_calculator");
Calculator * calc = create_calculator();
This scheme can be made more flexible (and complex) by allowing the create_calculator method name to vary, at the cost of having to obtain that from the config file as well.
Opening shared libraries using dlopen() is certainly the first thing that comes to my mind; it's a fine plan.

Distributing DLLs Inside an EXE (C++)

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.
There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).
There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.
Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

Profiling DLL/LIB Bloat

I've inherited a fairly large C++ project in VS2005 which compiles to a DLL of about 5MB. I'd like to cut down the size of the library so it loads faster over the network for clients who use it from a slow network share.
I know how to do this by analyzing the code, includes, and project settings, but I'm wondering if there are any tools available which could make it easier to pinpoint what parts of the code are consuming the most space. Is there any way to generate a "profile" of the DLL layout? A report of what is consuming space in the library image and how much?
When you build your DLL, you can pass /MAP to the linker to have it generate a map file containing the addresses of all symbols in the resulting image. You will probably have to do some scripting to calculate the size of each symbol.
Using a "strings" utility to scan your DLL might reveal unexpected or unused printable strings (e.g. resources, RCS IDs, __FILE__ macros, debugging messages, assertions, etc.).
Also, if you're not already compiling with /Os enabled, it's worth a try.
If your end goal is only to trim the size of the DLL, then after tweaking compiler settings, you'll probably get the quickest results by running your DLL through UPX. UPX is an excellent compression utility for DLLs and EXEs; it's also open-source with a non-viral license, so it's okay to use in commercial/closed-source products.
I've only had it turn up a virus warning on the highest compression setting (the brute-force option), so you'll probably be fine if you use a lower setting than that.
While i don't know about any binary size profilers, you could alternatively look for what object files (.obj) are the biggest - that gives you at least an idea of where your problematic spots are.
Of course this requires a sufficiently modularized project.
You can also try to link statically instead of using a dll. Indeed, when the library is linked statically the linker removes all unused functions from the final exe. Sometime the final exe is only slightly bigger and you don't have any more dll.
If your DLL is this big because it's exporting C++ function with exceptionally long mangled names, an alternative is to use a .DEF file to export the functions by ordinal, without name (using NONAME in the .DEF file). Somewhat brittle, but it reduces the DLL size, EXE size and load times.
See e.g. http://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm
Given that all your .obj files are about the same size, assuming that you're using precompiled headers, try creating an empty obj file and see how large it is. That will give you an idea of the proportion of each .obj that's due to the PCH compilation. The linker will be able to remove all the duplicates there, incidentally. Alternatively you could try disabling PCH so that the obj files will give you a better indication of where the main culprits are.
All good suggestions. What I do is get the map file and then just eyeball it. The kind of thing I've found in the past is that a large part of the space is taken by one or more class libraries brought in by the fact that some variable somewhere was declared as having a type that sounded like it would save some coding effort but wasn't really necessary.
Like in MFC (remember that?) they have a wrapper class to go around every thing like controls, fonts, etc. that Win32 provides. Those take a ton of space and you don't always need them.
Another thing that can take a ton of space is collection classes you could manage without. Another is cout I/O routines you don't use.
i would recommend one of the following:
coverage - you can run a coverage tool in the hope of detecting some dead code
caching - cache the dll on the client side on the initial activatio
splitting - split the dll into several smaller dlls, start the application with the bootstrap dll and download the other dlls after the application starts
compilation and linking - use smaller run time library, compile with size optimization, etc. see this link for more suggestions.
compression - if you have data or large resources within the dll, you can compress them and decompress only after the download or at runtime.

Loading DLL from a location in memory

As the question says, I want to load a DLL from a location in memory instead of a file, similarly to LoadLibrary(Ex). I'm no expert in WinAPI, so googled a little and found this article together with MemoryModule library that pretty much meets my needs.
On the other hand the info there is quite old and the library hasn't been updated for a while too. So I wanted to know if there are different, newer and better ways to do it. Also if somebody has used the library mentioned in the article, could they provide insight on what I might be facing when using it?
Just for the curious ones, I'm exploring the concept of encrypting some plug-ins for applications without storing the decrypted version on disk.
Implementing your own DLL loader can get really hairy really fast. Reading this article it's easy to miss what kind of crazy edge cases you can get yourself into. I strongly recommend against it.
Just for a taste - consider you can't use any conventional debugging tools for the code in the DLL you're loading since the code you're executing is not listed in the region of any DLL known by the OS.
Another serious issue is dealing with DEP in windows.
Well, you can create a RAM Drive according to these instructions, then copy the DLL you can in memory to a file there and the use LoadLibrary().
Of course this is not very practical if you plan to deploy this as some kind of product because people are going to notice a driver being installed, a reboot after the installation and a new drive letter under My Computer. Also, this does nothing to actually hide the DLL since its just sitting there in the RAM Drive for everybody to watch.
Another thing I'm interested about is Why you actually want to do this? Perhaps your end result can be achieved by some other means other than Loading the DLL from memory. For instance when using a binary packer such as UPX, the DLL that you have on disk is different from the one that is eventually executed. Immediately after the DLL is loaded normally with LoadLibrary, The unpacker kicks in and rewrites the memory which the DLL is loaded to with the uncompressed binary (the DLL header makes sure that there is enough space allocated)
Similar question was raised in here:
Load native C++ .dll from RAM in debugger friendly manner
One of the answers proposes dllloader sample application shown in github:
https://github.com/tapika/dllloader
It supports .dll debugging out of box.