Store data in executable - c++

I'm just curious about this for a long time.
Is it possible for an application to store some changeable data (like configurations and options) inside its own executable?
for example: is it possible to design a single executable which if a user ran, set some configurations, copied it into another PC, then the application runs by its last set config in new PC.
is this possible by any means?
Update: it seems that it's possible. then How?

Yes and no -
Yes, there's plenty of space in an executable image you can put data. You can add a pre-initialised data segment for this, say, and write the data into there; or a resource, or you can abuse some of the segment padding space to store values in. You control the linker settings so you can guarantee there will be space.
No, you probably can't do this at run-time:
Windows' caching mechanism will lock the files on disk of any executable loaded. This is so that it doesn't need to worry about writing out the data into cache if it ever needs to unload a segment - it can guarantee that it can get the same data back from the same location on disk. You may be able to get around this by running with one of the .exe load copy-to-temp flags (from CD, from Network) if the OS actually respects that, or you can write out a helper exe to temp to transfer control to, unload the original and then modify the unloaded file. (This is much easier on Linux etc. where inodes are effectively a reference count - even if they have the same default locking strategy you can copy your executable, edit the settings into the copy and then move it over the original whilst still executing.)
Virus checkers will almost certainly jump on you for this.
In general I think it's a much better idea to just write settings to the registry or somewhere and provide and import / export settings option if you think it'd be needed.
Expanding on the 'how' part -
In order to know where to write the data into your file you've got two or three options really:
Use a magic string, e.g. declare a global static variable with a known sequence at the start, e.g. "---my data here---", followed by enough empty space to store your settings in. Open the file on disk, scan it for that sequence (taking care that the scanning code doesn't actually contain the string in one piece, i.e. so you don't find the scanning code instead) - then you've found your buffer to write to. When the modified copy is executed it'll have the data already in your global static.
Understand and parse the executable header data in your binary to find the location you've used. One way would be to add a named section to your binary in the linker, e.g. a 4K section called 'mySettings' flagged it as initialised data. You can (although this is a beyond my knowledge) wire this up as an external buffer you can refer to by name in your code to read from. To write, find the section table in the executable headers, find the one called 'mySettings' and you'll have the offset in the binary that you need to modify.
Hard-code the offset of the buffer that you need to read / write. Build the file once, find the offset in a hex editor and then hard-code it into your program. Since program segments are usually rounded up to 4K you'll probably get away with the same hard-coded value through minor changes, though it may well just change underneath you.

Ya, you can do it.
It's risky.
You could screw up and make the app unrunable.
Modifying executables is something that virus and trojans tend to do.
It is likely that their virus scanner will notice, stop it, and brand you as an evil doer.
I know a little bit about evil :)

In case of windows PE files, you can write data at the end of the file. You need to know the EXE size before writing your own data so that in the 2nd writes onwards you know from which position in the exe file to start writing.
Also you can't modify the file when it's running. Your main program needs to extract and run a temporary exe somewhere so that when the main program finished, the temp exe writes configuration to the main exe file.

Yes, it's possible. You probably shouldn't do it.
Mac OS X does have the concept of "bundles" where they combine an executable and its resources into one "package" (file ending in .app), but I'm not sure it's typical for applications to modifying their own bundles, and most other operating systems don't work that way either as far as I know. It's more of a facility to store images and audio and so forth along with the code, as opposed to storing configuration data that is going to be modified when the program runs.

Modifying the executable file while it's running is a pain. The task is further complicated by any compiler optimizations your compiler may apply since it changes the structure of the program and might not allow you to have an "empty space" in which to write.

Difficult. Difficult. Difficult.
But in order to do this you basically have to read in the file into a buffer, or into another file, you can use direct fstream. However make sure you use the ios::binary flag. And append the buffer or file, I mean it's a horribly simple matter of actually appending the data. The problem lies in adding to itself.
Here's what I'd do:
first write a program to pack programs into other programs. You probably possess the knowledge already. Once you have that, have it pack itself into another program, be sure you've arranged for outside messaging or passing of arguments. Then on your main program you can simply unpack that program and pass in a link to a file you create (temporary) which you would like to append yourself with. kill your current program. Let the slave append the data and call your program again.
blam appended executable.

Related

Is there a way to find exact location ( adress ) of file on disk?

I'm developing a software using C++ for Windows/Linux.
I want to create a file (txt, json, license, you name it) at runtime, and save it somewhere. Is it possible in C++ to get the exact position of that file on disk, so that if I restart the app and read that address (or other), I'll be able to access its data?
The purpose of this would be that if someone copied the software to another OS, or created an image of the OS, and tried to run it, the address would not be valid anymore and it would fail. This is an attempt to add another layer (on top of license management) to protect against software copy.
An image of a disk copies it byte by byte, meaning that all addresses (locations) on disk stay exactly the same. So your copy protection won't actually work - you can still easily clone the disk while preserving your special copy protection file. Additionally, a file may not even have a defined location on disk: It may be split up into many fragments and scattered across the entire disk. You can't find an address that doesn't exist. The filesystem may also just move the file around whenever it feels like it so the address doesn't stay the same even on the same system. (This is what defragmentation does. Windows, for example, moves files around like that on a fixed schedule to make the filesystem faster.)
TL;DR this is not going to work.

How safe are memory-mapped files for reading input files?

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.
However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX e.g. doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)
If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.
Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.
It is not really a problem.
Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely, since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.
If you use conventional I/O (such as read), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications. It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv, you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, e.g. under Linux writes are demonstrably not atomic. Not even by accident.
The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (e.g. log file), or a file that you explicitly told the process to write to (e.g. saving in a text editor), or the process creates a new file (e.g. compiler creating an object file), or the process merely appends to an existing file (e.g. db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).
In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.
If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).
In theory, you're probably in real trouble if someone does
modify the file while you're reading it. In practice: you're
reading characters, and nothing else: no pointers, or anything
which could get you into trouble. In practice... formally,
I think it's still undefined behavior, but it's one which
I don't think you have to worry about. Unless the modifications
are very minor, you'll get a lot of compiler errors, but that's
about the end of it.
The one case which might cause problems is if the file was
shortened. I'm not sure what happens then, when you're reading
beyond the end.
And finally: the system isn't arbitrarily going to open and
modify the file. It's a source file; it will be some idiot
programmer who does it, and he deserves what he gets. In no
case will your undefined behavior corrupt the system or other
peoples files.
Note too that most editors work on a private copy; when the
write back, they do so by renaming the original, and creating
a new file. Under Unix, once you've opened the file to mmap
it, all that counts is the inode number. And when the editor
renames or deletes the file, you still keep your copy. The
modified file will get a new inode. The only thing you have to
worry about is if someone opens the file for update, and then
goes around modifying it. Not many programs do this on text
files, except for appending additional data to the end.
So while formally, there's some risk, I don't think you have to
worry about it. (If you're really paranoid, you could turn off
write authorisation while you're mmaped. And if there's
really an enemy agent out to get your, he can turn it right back
on.)

Dynamic binary from file

This is a little bit of weird problem here.
Say I have a C++ code, running on a specific platform. The only purpose of this code is to run files containing binary, NATIVE to that platform.
Now - my question is - HOW would I get the data from these files (could even be by bits, like 1024 bits a cycle) to the memory of machine running my code so that this data would be in the execution part?
In other words, can I get the data to somewhere where I can point the instruction pointer?
If yes, how?
I don't mind if I have to use assembler for this - just so it would work.
EDIT: Sorry, I wasn't specific enough.
Basically - the file data I want to load is in no format like .exe, Mach-O executable or ELF. It is just raw binary data (with some custom headers which my code removes). This binary data is machine code, specific and suited for the processor running on the current machine.
Technically this means I want to go around normal OS executors and load the binary directly. I could interpret it but that would be around 3x slower at best.
I would not mind at all if I need to run the data in another child process - but again, I can not use normal process openers, because the data is not in any format that the OS could run automatically (again - like .exe, Mach-O, ELF).
You should just read file to memory, mark this memory as executable (it's platform-specific, for example VirtualProtect() for Windows), and call it as function: ((void(*)())ptr)();
Of course code in file should be position-independent.
How generic does your solution need to be?
The OS loader usually knows how to handle specific file formats and how to load them into memory. You can usually use the OS loader and call the entry point, but for that you need to know what is the entry point and for that you need to understand the file format anyway. I can't think of a portable solution for that.
If you'll explain more about what you want to do maybe it will be possible to supply a solution.

Using dlopen, how can I cope with changes to the library file I have loaded?

I have a program written in C++ which uses dlopen to load a dynamic library (Linux, i386, .so). When the library file is subsequently modified, my program tends to crash. This is understandable, since presumably the file is simply mapped into memory.
My question is: other than simply creating myself a copy of the file and dlopening that, is there way for me to load a shared object which is safe against subsequent modifications, or any way to recover from modifications to a shared object that I have loaded?
Clarification: The question is not "how can I install a new library without crashing the program", it is "if someone who I don't control is copying libraries around, is it possible for me to defend against that?"
If you rm the library prior to installing the new one, I think your system will keep the inode allocated, the file open, and your program running. (And when your program finally exits, then the mostly-hidden-but-still-there file resources are released.)
Update: Ok, post-clarification. The dynamic linker actually completely "solves" this problem by passing the MAP_COPY flag, if available, to mmap(2). However, MAP_COPY does not exist on Linux and is not a planned future feature. Second best is MAP_DENYWRITE, which I believe the loader does use, and which is in the Linux API, and which Linux used to do. It errors-out writes while a region is mapped. It should still allow an rm and replace. The problem here is that anyone with read-access to a file can map it and block writes, which opens a local DoS hole. (Consider /etc/utmp. There is a proposal to use the execute permission bit to fix this.)
You aren't going to like this, but there is a trivial kernel patch that will restore MAP_DENYWRITE functionality. Linux still has the feature, it just clears the bit in the case of mmap(2). You have to patch it in code that is duplicated per-architecture, for ia32 I believe the file is arch/x86/ia32/sys_ia32.c.
asmlinkage long sys32_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
{
struct mm_struct *mm = current->mm;
unsigned long error;
struct file *file = NULL;
flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); // fix this line to not clear MAP_DENYWRITE
This should be OK as long as you don't have any malicious local users with credentials. It's not a remote DoS, just a local one.
If you install a new version of the library, the correct procedure is to create a new file in the same directory, then rename it over the old one. The old file will remain while it's open, and continue to be used.
Package managers like RPM do this automatically - so you can update shared libraries and executables while they're running - but the old versions keep running.
In the case where you need to take a new version, restart the process or reload the library - restarting the process sounds better - your program can exec itself. Even init can do this.
It is not possible to defend against someone overwriting your library if they have file write permission.
Because dlopen memory maps the library file, all changes to the file are visible in every process that has it open.
The dlopen function uses memory mapping because it is the most memory efficient way to use shared libraries. A private copy would waste memory.
As others have said, the proper way to replace a shared library in a Unix is to use unlink or rename, not to overwrite the library with a new copy. The install command will do this properly.
This is an intriguing question. I hate finding holes like this in Linux, and love finding ways to fix them.
My suggestion is inspired by the #Paul Tomblin answer to this question about temporary files on Linux. Some of the other answers here have suggested the existence of this mechanism, but have not described a method of exploiting it from the client application as you requested.
I have not tested this, so I have no idea how well it will work. Also, there may be minor security concerns associated with a race condition related to the brief period of time between when the temporary file is created and when it is unlinked. Also, you already noted the possibility of creating a copy of the library, which is what I am proposing. My twist on this is that your temporary copy exists as an entry in the file system for only an instant, regardless of how long you actually hold the library open.
When you want to load a library follow these steps:
copy the file to a temporary location, probably starting with mkstemp()
load the temporary copy of the library using dlopen()
unlink() the temporary file
proceed as normal, the file's resources will be automatically removed when you dlclose()
It would be nice if there were a really easy way to achieve the "copy the file" step without requiring you to actually copy the file. Hard-linking comes to mind, but I don't think that it would work for these purposes. It would be ideal if Linux had a copy-on-write mechanism which was as easy to use as link(), but I am not aware of such a facility.
Edit: The #Zan Lynx answer points out that creating custom copies of dynamic libraries can be wasteful if they are replicated into multiple processes. So my suggestion probably only makes sense if it is applied judiciously -- to only those libraries which are at risk of being stomped (presumably a small subset of all libraries which does not include files in /lib or /usr/lib).
If you can figure out where your library is mapped into memory, then you might be able to mprotect it writeable and do a trivial write to each page (e.g. read and write back the first byte of each page). That should get you a private copy of every page.
If 'mprotect' doesn't work (it may not, the original file was probably opened read-only), then you can copy the region out to another location, remap the region (using mmap) to a private, writeable region, then copy the region back.
I wish the OS had a "transform this read-only region to a copy-on-write region". I don't think something like that exists, though.
In any of these scenarios, there is still a window of vulnerability - someone can modify the library while dlopen is calling initializers or before your remap call happens. You're not really safe unless you can fix the dynamic linker like #DigitalRoss describes.
Who is editing your libraries out from under you, anyway? Find that person and hit him over the head with a frying pan.

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?
Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.
Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.
I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.
Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute
In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.
A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.
There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.
Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.
In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.
This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)
on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.
Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.
The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.