Can I load a library from a memory stream? - c++

Can I load a library from a memory stream? For example my library is encoded a file. I check some conditions and decrypt the file into a memory stream. Now I need to load the decrypted library from that stream into my application and use its functions etc.

In windows, A DLL can only be loaded from a file - as the links suggested, you can create a ramdisk and install that as a drive, but there is no way around the DLL needing to be loading through an file that exists in a filesystem. Part of the reason for this is that the DLL is "demand loaded", that is the system does not load the entire file into memory at once, it loads the parts that are actually being used, 4KB (typically) at a time. It is also not swapped out to the swap area, it is just discarded and re-loaded from the DLL if the system is running short of memory.
Linux works in a very similar way (I know it uses the same kind of demand-loading by default, but not sure if there is a way around it), so I don't believe there is any other way there either, but I haven't looked into it at depth.
Of course, if all you want is a piece of code that you can use in your application, and you want to store that as encrypted/compressed/whatever in your exectuable file, what you can do is allocate some executable memory (in Windows, you can use VirtualAlloc to allocate executable memory). However, you need to ensure that you relocate any absolute memory addresses in your code if you do that, so you will need to store the relocation information in your executable.
Clearly, the easy solution is to unpack your content into a file in the filesystem, and load from there.

Related

Providing data directly instead of file path

I am dealing with a closed source library which needs some data to be passed to it in order to work. This data is around 150 MB. I have this data loaded in memory at the moment of initializing the main class of this lib which has the following constructor:
Foo::Foo(const std::string path_to_data_file);
Foo accepts the data as a file and there is no another overload that accepts the data directly (as string or byte array...).
The only possible way to call the library in my case is to write the data I have to disk then pass the path of the file to the library which is a very bad idea..
Is there is any technique to pass some kind of virtual path to the library that will result in reading the data from memory directly instead of the disk?
In other words, I am looking for (if it is exist or even possible) some technique that creates a virtual file that leads to memory address rather than physical address on the Disk.
I know that the right solution is to edit the library and isolate the data layer from the processing layer. However, this is not possible for now at least..
Edit:
the solution should be cross-platform. However, I can guess that those problem are usually OS dependent. So, I am looking for Linux and Windows solution.
The library is doing some Computer Vision stuffs and data is a kind of trained model
It is probably operating system specific. But you could put the data into some RAM or virtual memory based filesystem like tmpfs.
Then you don't need to change the library, you just pass it some file in a tmpfs file system.
BTW, on some OSes, if you have recently written a file, it sits in the page cache (so is in RAM).
Notice also that reading 150Mb should not take much. If you can't put it on some tmpfs or RAM disk, try at least to use some SSD.

Execute a process from memory within another process?

I would like to have a small "application loader" program that receives other binary application files over TCP from an external server and runs them.
I could do this by saving the transmitted file to the hard disk and using the system() call to run it. However, I am wondering if it would be possible to launch the new application from memory without it ever touching the hard drive.
The state of the loader application does not matter after loading a new application. I prefer to stick to C, but C++ solutions are welcome as well. I would also like to stick to standard Linux C functions and not use any external libraries, if possible.
Short answer: no.
Long answer: It's possible but rather tricky to do this without writing it out to disk. You can theoretically write your own elf loader that reads the binary, maps some memory, handles the dynamic linking as required, and then transfers control but that's an awful lot of work, that's hardly ever going to be worth the effort.
The next best solution is to write it to disk and call unlink ASAP. The disk doesn't even have to be "real" disk, it can be tmpfs or similar.
The alternative I've been using recently is to not pass complete compiled binaries around, but to pass LLVM bytecode instead, which can then be JIT'd/interpreted/saved as fit. This also has the advantage of making your application work in heterogeneous environments.
It may be tempting to try a combination of fmemopen, fileno and fexecve, but this won't work for two reasons:
From fexecve() manpage:
"The file descriptor fd must be opened read-only, and the caller must have permission to execute the file that it refers to"
I.e. it needs to be a fd that refers to a file.
From fmemopen() manpage:
"There is no file descriptor associated with the file stream returned by these functions (i.e., fileno(3) will return an error if called on the returned stream)"
Much easier than doing it is C would just to set up a tmpfs file system. You'd have all the advantages of the interface of a harddisk, from your program / server / whatever you could just do an exec. These types of virtual filesystems are quite efficient nowadays, there would be really just one copy of the executable in the page cache.
As Andy points out, for such scheme to be efficient you'd have to ensure that you don't use buffered writes to the file but that you "write" (in a broader sense) directly in place.
you'd have to know how large your executable will be
create a file on your tmpfs
scale it to that size with ftruncate
"map" that file into memory with mmap to obtain the addr of a buffer
pass that address directly to the recv call to write the data in place
munmap the file
call exec with the file
rm the file. can be done even when the executable is still running
You might want to look at and reuse UPX, which decompresses the executable to memory, and then transfers control to ld-linux to start it.

Store data in executable

I'm just curious about this for a long time.
Is it possible for an application to store some changeable data (like configurations and options) inside its own executable?
for example: is it possible to design a single executable which if a user ran, set some configurations, copied it into another PC, then the application runs by its last set config in new PC.
is this possible by any means?
Update: it seems that it's possible. then How?
Yes and no -
Yes, there's plenty of space in an executable image you can put data. You can add a pre-initialised data segment for this, say, and write the data into there; or a resource, or you can abuse some of the segment padding space to store values in. You control the linker settings so you can guarantee there will be space.
No, you probably can't do this at run-time:
Windows' caching mechanism will lock the files on disk of any executable loaded. This is so that it doesn't need to worry about writing out the data into cache if it ever needs to unload a segment - it can guarantee that it can get the same data back from the same location on disk. You may be able to get around this by running with one of the .exe load copy-to-temp flags (from CD, from Network) if the OS actually respects that, or you can write out a helper exe to temp to transfer control to, unload the original and then modify the unloaded file. (This is much easier on Linux etc. where inodes are effectively a reference count - even if they have the same default locking strategy you can copy your executable, edit the settings into the copy and then move it over the original whilst still executing.)
Virus checkers will almost certainly jump on you for this.
In general I think it's a much better idea to just write settings to the registry or somewhere and provide and import / export settings option if you think it'd be needed.
Expanding on the 'how' part -
In order to know where to write the data into your file you've got two or three options really:
Use a magic string, e.g. declare a global static variable with a known sequence at the start, e.g. "---my data here---", followed by enough empty space to store your settings in. Open the file on disk, scan it for that sequence (taking care that the scanning code doesn't actually contain the string in one piece, i.e. so you don't find the scanning code instead) - then you've found your buffer to write to. When the modified copy is executed it'll have the data already in your global static.
Understand and parse the executable header data in your binary to find the location you've used. One way would be to add a named section to your binary in the linker, e.g. a 4K section called 'mySettings' flagged it as initialised data. You can (although this is a beyond my knowledge) wire this up as an external buffer you can refer to by name in your code to read from. To write, find the section table in the executable headers, find the one called 'mySettings' and you'll have the offset in the binary that you need to modify.
Hard-code the offset of the buffer that you need to read / write. Build the file once, find the offset in a hex editor and then hard-code it into your program. Since program segments are usually rounded up to 4K you'll probably get away with the same hard-coded value through minor changes, though it may well just change underneath you.
Ya, you can do it.
It's risky.
You could screw up and make the app unrunable.
Modifying executables is something that virus and trojans tend to do.
It is likely that their virus scanner will notice, stop it, and brand you as an evil doer.
I know a little bit about evil :)
In case of windows PE files, you can write data at the end of the file. You need to know the EXE size before writing your own data so that in the 2nd writes onwards you know from which position in the exe file to start writing.
Also you can't modify the file when it's running. Your main program needs to extract and run a temporary exe somewhere so that when the main program finished, the temp exe writes configuration to the main exe file.
Yes, it's possible. You probably shouldn't do it.
Mac OS X does have the concept of "bundles" where they combine an executable and its resources into one "package" (file ending in .app), but I'm not sure it's typical for applications to modifying their own bundles, and most other operating systems don't work that way either as far as I know. It's more of a facility to store images and audio and so forth along with the code, as opposed to storing configuration data that is going to be modified when the program runs.
Modifying the executable file while it's running is a pain. The task is further complicated by any compiler optimizations your compiler may apply since it changes the structure of the program and might not allow you to have an "empty space" in which to write.
Difficult. Difficult. Difficult.
But in order to do this you basically have to read in the file into a buffer, or into another file, you can use direct fstream. However make sure you use the ios::binary flag. And append the buffer or file, I mean it's a horribly simple matter of actually appending the data. The problem lies in adding to itself.
Here's what I'd do:
first write a program to pack programs into other programs. You probably possess the knowledge already. Once you have that, have it pack itself into another program, be sure you've arranged for outside messaging or passing of arguments. Then on your main program you can simply unpack that program and pass in a link to a file you create (temporary) which you would like to append yourself with. kill your current program. Let the slave append the data and call your program again.
blam appended executable.

Is a DLL loaded entirely or only some functions?

When a program uses a dynamic shared library, does it load the DLL entirely (so you can almost erase the DLL from disk during application is running) or does it load only a part of the DLL according to its need at each time during the runtime life of the application?
DLL gets loaded entirely. DLLs are same as EXEs in almost all aspect; the only big difference between them is, DLLs are not executable. It doesn't have main() function - the start of a program.
I don't know how the details work in Windows (in Linux I know the responsible code in the kernel quite well), but at least in *nix systems deleting a filesystem entry leaves the file contents intact as long there are file descriptor/handles opened on it.; only after closing the last file descriptor/handle the blocks on the storage device may get overwritten. Windows is POSIX certified, so it follows this behaviour.
DLLs are not loaded into preallocated memory. They're memory mapped. This causes kind of the reverse of swap memory. Instead of swapping RAM to a disk, the contents of the file are mapped into process address space and will end up in RAM through disk/file cache. The same goes for shared objects in *nix operating systems. But there are significant differences between Windows and *nix systems deal with relocations, symbol exports and so on.
It's being loaded entirely, as was pointed out. The special part is not that you can't run the DLL, it's that the memory pages of a DLL are usually shared across process boundaries.
Should a process attempt to write into a page, a copy of that page is taken and the copy is only visible to this process (it's called copy-on-write).
DLLs are PE files (i.e. the same as NT drivers or Win32 programs). They are loaded similarly to .exe files into Memory Mapped Files (MMFs, or "sections" in kernel mode parlance). This means that the DLL file is backing the MMF that represents the loaded DLL. This is the same as when passing a valid file handle (not INVALID_HANDLE_VALUE) to CreateFileMapping and it's also (part of) the reason why you can't delete the DLL while it is in use.
Also, there are some DLLs that contain no code at all. Such a DLL can then also be loaded into a process that was not made for the same processor. E.g. the x86 resource DLL loads fine into an x64 application.

Using dlopen, how can I cope with changes to the library file I have loaded?

I have a program written in C++ which uses dlopen to load a dynamic library (Linux, i386, .so). When the library file is subsequently modified, my program tends to crash. This is understandable, since presumably the file is simply mapped into memory.
My question is: other than simply creating myself a copy of the file and dlopening that, is there way for me to load a shared object which is safe against subsequent modifications, or any way to recover from modifications to a shared object that I have loaded?
Clarification: The question is not "how can I install a new library without crashing the program", it is "if someone who I don't control is copying libraries around, is it possible for me to defend against that?"
If you rm the library prior to installing the new one, I think your system will keep the inode allocated, the file open, and your program running. (And when your program finally exits, then the mostly-hidden-but-still-there file resources are released.)
Update: Ok, post-clarification. The dynamic linker actually completely "solves" this problem by passing the MAP_COPY flag, if available, to mmap(2). However, MAP_COPY does not exist on Linux and is not a planned future feature. Second best is MAP_DENYWRITE, which I believe the loader does use, and which is in the Linux API, and which Linux used to do. It errors-out writes while a region is mapped. It should still allow an rm and replace. The problem here is that anyone with read-access to a file can map it and block writes, which opens a local DoS hole. (Consider /etc/utmp. There is a proposal to use the execute permission bit to fix this.)
You aren't going to like this, but there is a trivial kernel patch that will restore MAP_DENYWRITE functionality. Linux still has the feature, it just clears the bit in the case of mmap(2). You have to patch it in code that is duplicated per-architecture, for ia32 I believe the file is arch/x86/ia32/sys_ia32.c.
asmlinkage long sys32_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
{
struct mm_struct *mm = current->mm;
unsigned long error;
struct file *file = NULL;
flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); // fix this line to not clear MAP_DENYWRITE
This should be OK as long as you don't have any malicious local users with credentials. It's not a remote DoS, just a local one.
If you install a new version of the library, the correct procedure is to create a new file in the same directory, then rename it over the old one. The old file will remain while it's open, and continue to be used.
Package managers like RPM do this automatically - so you can update shared libraries and executables while they're running - but the old versions keep running.
In the case where you need to take a new version, restart the process or reload the library - restarting the process sounds better - your program can exec itself. Even init can do this.
It is not possible to defend against someone overwriting your library if they have file write permission.
Because dlopen memory maps the library file, all changes to the file are visible in every process that has it open.
The dlopen function uses memory mapping because it is the most memory efficient way to use shared libraries. A private copy would waste memory.
As others have said, the proper way to replace a shared library in a Unix is to use unlink or rename, not to overwrite the library with a new copy. The install command will do this properly.
This is an intriguing question. I hate finding holes like this in Linux, and love finding ways to fix them.
My suggestion is inspired by the #Paul Tomblin answer to this question about temporary files on Linux. Some of the other answers here have suggested the existence of this mechanism, but have not described a method of exploiting it from the client application as you requested.
I have not tested this, so I have no idea how well it will work. Also, there may be minor security concerns associated with a race condition related to the brief period of time between when the temporary file is created and when it is unlinked. Also, you already noted the possibility of creating a copy of the library, which is what I am proposing. My twist on this is that your temporary copy exists as an entry in the file system for only an instant, regardless of how long you actually hold the library open.
When you want to load a library follow these steps:
copy the file to a temporary location, probably starting with mkstemp()
load the temporary copy of the library using dlopen()
unlink() the temporary file
proceed as normal, the file's resources will be automatically removed when you dlclose()
It would be nice if there were a really easy way to achieve the "copy the file" step without requiring you to actually copy the file. Hard-linking comes to mind, but I don't think that it would work for these purposes. It would be ideal if Linux had a copy-on-write mechanism which was as easy to use as link(), but I am not aware of such a facility.
Edit: The #Zan Lynx answer points out that creating custom copies of dynamic libraries can be wasteful if they are replicated into multiple processes. So my suggestion probably only makes sense if it is applied judiciously -- to only those libraries which are at risk of being stomped (presumably a small subset of all libraries which does not include files in /lib or /usr/lib).
If you can figure out where your library is mapped into memory, then you might be able to mprotect it writeable and do a trivial write to each page (e.g. read and write back the first byte of each page). That should get you a private copy of every page.
If 'mprotect' doesn't work (it may not, the original file was probably opened read-only), then you can copy the region out to another location, remap the region (using mmap) to a private, writeable region, then copy the region back.
I wish the OS had a "transform this read-only region to a copy-on-write region". I don't think something like that exists, though.
In any of these scenarios, there is still a window of vulnerability - someone can modify the library while dlopen is calling initializers or before your remap call happens. You're not really safe unless you can fix the dynamic linker like #DigitalRoss describes.
Who is editing your libraries out from under you, anyway? Find that person and hit him over the head with a frying pan.