Unpacking an executable from within a library in C/C++

Unpacking an executable from within a library in C/C++ - c++

I am developing a library that uses one or more helper executable in the course of doing business. My current implementation requires that the user have the helper executable installed on the system in a known location. For the library to function properly the helper app must be in the correct location and be the correct version.
I would like to removed the requirement that the system be configured in the above manner.
Is there a way to bundle the helper executable in the library such that it could be unpacked at runtime, installed in a temporary directory, and used for the duration of one run? At the end of the run the temporary executable could be removed.
I have considered automatically generating an file containing an unsigned char array that contains the text of the executable. This would be done at compile time as part of the build process. At runtime this string would be written to a file thus creating the executable.
Would it be possible to do such a task without writing the executable to a disk (perhaps some sort of RAM disk)? I could envision certain virus scanners and other security software objecting to such an operation. Are there other concerns I should be worried about?
The library is being developed in C/C++ for cross platform use on Windows and Linux.

"A clever person solves a problem. A
wise person avoids it." — Albert Einstein
In the spirit of this quote I recommend that you simply bundle this executable along with the end-application.
Just my 2 cents.

You can use xxd to convert a binary file to a C header file.
$ echo -en "\001\002\005" > x.binary
$ xxd -i x.binary
unsigned char x_binary[] = {
0x01, 0x02, 0x05
};
unsigned int x_binary_len = 3;
xxd is pretty standard on *nix systems, and it's available on Windows with Cygwin or MinGW, or Vim includes it in the standard installer as well. This is an extremely cross-platform way to include binary data into compiled code.
Another approach is to use objcopy to append data on to the end of an executable -- IIRC you can obtain objcopy and use it for PEs on Windows.
One approach I like a little better than that is to just append raw data straight onto the end of your executable file. In the executable, you seek to the end of the file, and read in a number, indicating the size of the attached binary data. Then you seek backwards that many bytes, and fread that data and copy it out to the filesystem, where you could treat it as an executable file. This is incidentally the way that many, if not all, self-extracting executables are created.
If you append the binary data, it works with both Windows PE files and *nix ELF files -- neither of them read past the "limit" of the executable.
Of course, if you need to append multiple files, you can either append a tar/zip file to your exe, or you'll need a slightly more advance data structure to read what's been appended.
You'll also probably want to UPX your executables before you append them.
You might also be interested in the LZO library, which is reportedly one of the fastest-decompressing compression libraries. They have a MiniLZO library that you can use for a very lightweight decompressor. However, the LZO libraries are GPL licensed, so that might mean you can't include it in your source code unless your code is GPLed as well. On the other hand, there are commercial licenses available.

Slightly different approach than using an unsigned char* array is to put the entire executable binary as resource of the dll. At runtime, you can save the binary data as a local temp file and execute the app. I'm not sure if there is a way to execute an executable in memory, though.

For the library to function properly
the helper app must be in the correct
location
On Windows, would that be the Program Files directory or System32 directory?
This might be a problem. When an application is installed, particularly in a corporate environment, it usually happens in an context with administrative rights. On Vista and later with UAC enabled (the default), this is necessary to write to certain directories. And most Unix flavours have had sensible restrictions like that for as long as anyone can remember.
So if you try to do it at the time the host application calls into your library, that might not be in a context with sufficient rights to install the files, and so your library would put constraints on the host application.
(Another thing that will be ruled out is Registry changes, or config file updates on the various Unices, if the host application doesn't have the ability to elevate the process to an administrative level.)
Having said all that, you say you're considering unpacking the helpers into a temporary directory, so maybe this is all moot.

Qt has an excellent method of achieving this: QResource
"The Qt resource system is a platform-independent mechanism for storing binary files in the application's executable."
You don't say if you are currently using Qt, but you do say "C++ for cross platform use on Windows and Linux", so even if you aren't using it, you may want to consider starting.

There is a way in Windows to run an executable from within memory without writing it to disk. The problem is that due to modern security systems (DEP) this probably won't work on all systems and almost any anti-malware scanner will detect it and warn the user.
My advice is to simply package the executable into your distribution, it's certainly the most reliable way to achieve this.

Well, my first thought would be: what does this helper executable do that couldn't be done within your library's code itself, perhaps using a secondary thread if necessary. This might be something to consider.
But as for the actual question... If your "library" is actually bundled up as a dll (or even an exe) then at least Windows has relatively simpe support for embedding files within your library.
The resource mechanism that allows things like version information and icons to be embedded within executables can also allow arbitrary chunks of data. Since I don't know what development environment you're using, I can't say exactly how to do this. But roughly speaking, you'd need to create a custom resource with a type of "FILE" or something sensible like that and point it at the exe you want to embed.
Then, when you want to extract it, you would write something like
HRSRC hResource = FindResource(NULL, MAKEINTRESOURCE(IDR_MY_EMBEDDED_FILE), "FILE");
HGLOBAL hResourceData = LoadResource(NULL, hResource);
LPVOID pData = LockResource(hResourceData);
HANDLE hFile = CreateFile("DestinationPath\\Helper.exe", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesWritten = 0;
WriteFile(hFile, pData, SizeofResource(NULL, hResource), &dwBytesWritten, NULL);
CloseHandle(hFile);
(filling in your own desired path, filename, and any appropriate error checking of course)
After that, the helper exe exists as a normal exe file and so you can execute it however you normally would.
For removing the file after use, you should investigate the flags for CreateFile, particularly FILE_FLAG_DELETE_ON_CLOSE. You might also look at using MoveFileEx when combining the MOVEFILE_DELAY_UNTIL_REBOOT flag with NULL passed for the new file name. And of course, you could always delete it in your own code if you can tell when the executable has ended.
I don't know enough about Linux executables, so I don't know if a similar feature is available there.
If Linux doesn't provide any convenient mechanism and/or if this idea doesn't suit your needs in Windows, then I suppose your idea of generating an unsigned char array from the contents of the helper exe would be the next best way to embed the exe in your library.

Related

How to cross-platformly build binary resources into program?

We are developing C++ applications for Windows, Mac and Linux. The application have many binary resources, and for some reason we need to pack them within executable binary, instead of laying at directories (or Apple App bundle).
At current, we use a script to convert these resources into C++ array constants, then compile and link them. However this approach have so many deficiencies:
You have to compile the resource source code, it takes time and is unnecessary in essential.
The resource source codes would be parsed by IDE. As they are large, code analytic is greatly slowed down.
MSVC have limit on source code size, so large resources (several MB) must be separated into many parts then concatenated at run-time.
After some study, I found some solutions:
In Windows, I can use .rc files and related WinAPI.
In Linux, I can directly convert arbitrary binary file into obj file via objcopy.
However, there are still some questions remaining:
The use of WinAPI to fetch resources needs many functions to access one resource. Is there any simpler ways in Windows?
How to do it in Mac?

A quite common trick, most notably used for self-extracting archives or scripting language to executable compilers, is to append the resources at the end of the executable file.
Windows:
copy app.exe+all-resources app-with-resources.exe
Linux:
cp executable executable-with-resources
cat all-resources >>executable-with-resources
Then you can read your own executable using fopen(argv[0]) for example.
In order to jump at the correct position, i.e. beginning of resources, a possible solution is to store the size of the executable without resources as the last word of the file.
FILE* fp = fopen(argv[0], "rb");
fseek(fp, -sizeof(int), SEEK_END);
int beginResourcesOffset;
fread(&beginResourcesOffset, 1, sizeof(int), fp);
fseek(fp, beginResourcesOffset, SEEK_SET);
Be careful with this solution though, anti-virus on windows sometimes don't like it. There probably are better solutions.

When writing a portable c/c++ program, what is the best way to consume external files?

I'm pretty new to the c/c++ scene, I've been spoon fed on virtual machines for too long.
I'm modifying an existing C++ tool that we use across the company. The tool is being used on all the major operating systems (Windows, Mac, Ubuntu, Solaris, etc). I'm attempting to bridge the tool with another tool written Java. Basically I just need to call java -jar from the C++ tool.
The problem is, how do I know where the jar is located on the user's computer? The c++ executables are currently checked into Perforce, and users sync and then call the exe, presumably leaving the exe in place (although they could copy it somewhere else). My current solution checks in the jar file beside the exe.
I've looked at multiple ways to calculate the location of the exe from C++, but none of them seem to be portable. On windows there is a 'GetModuleLocation' and on posix you can look at the procs/process.exe info to figure out the location of the process. And on most systems you can look at argv[0] to figure out where the exe is. But most of these techniques are 100% guaranteed due to users using $PATH, symlinks, etc to call the exe.
So, any guidance on the right way to do this that will always work? I guess I have no problem ifdef'ing multiple solutions, but it seems like there should be a more elegant way to do this.

I don't believe there is a portable way of doing this. The C++ standard itself does not define anything about the execution environment. The best you get is the std::system call, and that can fail for things like Unicode characters in path names.
The issue here is that C and C++ are both used on systems where there's no such thing as an operating system. No such thing as $PATH. Therefore, it would be nonsensical for the standards committee to require a conforming implementation provide such features.
I would just write one implementation for POSIX, one for Mac (if it differs significantly from the POSIX one... never used it so I'm not sure), and one for Windows (Select which one at compilation time with the preprocessor). It's maybe 3 function calls for each one; not a lot of code, and you'll be sure you're following the conventions of your target platform.

I'd like to point you to a few URLs which might help you find where the current executable was located. It does not appear as if there is one method for all (aside from the ARGV[0] + path search method which as you note is spoofable, but…are you really in a threat environment such that this is likely to happen?).
How to get the application executable name in WindowsC++/CLI?
https://superuser.com/questions/49104/handy-tool-to-find-executable-program-location
Finding current executable's path without /proc/self/exe
How do I find the location of the executable in C?

There are several solutions, none of them perfect. Under Windows, as
you have said, you can use GetModuleLocation, but that's not available
under Unix. You can try to simulate how the shell works, using
argv[0] and getenv("PATH"), but that's not easy, and it's not 100%
reliable either. (Under Unix, and I think under Windows as well, the
spawning application can hoodwink you, and put any sort of junk in
argv[0].) The usual solution under Unix is to require an environment
variable, e.g. MYAPPLICATION_HOME, which should contain the root
directory where you're application is installed; the application won't
start without it. Or you can ask the user to specify the root path with
a command line option.
In practice, I usually use all three: the command line option has
precedence, and is very useful when testing; the environment variable
works well in the Unix world, since it's what people are used to; and if
neither are present, I'll try to work out the location from where I was
started, using system dependent code: GetModuleLocation under Windows,
and getenv("PATH") and all the rest under Unix. (The Unix solution
isn't that hard if you already have code for breaking a string into
fields, and are using boost::filesystem.)

Good solution would be to write your custom function that is guaranteed to work in every platform you use. Preferably should use runtime checks if it worked, and then fallback to ifdefs only if some way of detecting it is not available in all platforms. But it might not be easy to detect if your code that executes correctly for example argv[0] would return the correct path...

Embedding compressed files into a c++ program

I want to create a cross-platform installer, in c++. It can be any compression type, eg zip or gzip, embedded inside the program itself like an average installer. I don't want to create many changes on different platforms, linux and windows. How do I embed and extract files to a c++ program, cross-platform?

C++ is a poor choice for a cross-platform installer, because there's no such thing as cross-platform machine code.
C++ code can be extremely portable, but it needs to be compiled for each platform, and then you get a distinct output executable for each platform.
If you want to build installers for many platforms from a single source file, you can use C++. But if you want to build ONE installer that works on many platforms, you'll need to use an interpreted or JIT-compiled language with runtime support available on all your targets. Of those, the only one likely to already be installed on a majority of computers of each platform is Java.
Ok, assuming that you're building many single-platform installers from machine code, this is what is needed:
You need to get the compressed code into the program. You want to do this in a way that doesn't affect the load time badly, nor cause compilation to take a few months. So using an initialized global array is a bad idea.
One way is to link your data in as an additional section. There are tools to help with that, e.g. Binary to COFF converter, I've seen an ELF version as well, maybe this. But this might still cause the runtime library to try to lead the entire file into memory before execution begins.
Another way is to use platform-specific resource APIs. This is efficient, but platform specific.
The most straightforward solution is to simply append the compressed archive to your executable, then append eight more bytes with the file offset where the compressed archive begins. Unpacking is then as simple as opening the executable in read-only mode, fseek(-8, SEEK_END), reading the correct offset, then seeking to the beginning of the compressed data and passing that stream to your decompressor.
Of course, now I find a website listing pretty much the same methods.
And here's a program which implements the last option, with additional ability to store multiple files. I wouldn't recommend doing that, let the compression library take care of storing the file metadata.

The only way I know of to portably embed data (strings or raw, binary
data) in a C++ program is to convert it into a data table, then compile
that. For raw data, this would look something like:
unsigned char data[] =
{
// raw data here.
};
It should be fairly trivial to write a small program which reads your
binary data, and writes it out as a C++ table, like the above. Compile
it, and link it into your program, and there you are.

Use zlib.
Have your packing program generate a list of exe's in the program. i,e.
unsigned char x86_windows_version[] = { 0xff,...,0xff};
unsigned char arm_linux_version[] = { 0xff,...,0xff};
unsigned char* binary_files[MAX_BINARIES] = {x86_windows_version,arm_linux_version};
somewhere in your excitable:
enflate(x86_windows_version);
And thats about it. Look at the zlib docs for the parameters for enflate() and deflate() and that's about it.
It's a pattern used a lot on embedded platforms (that are not linux) mostly for string _tables and other image binaries. It should work for your needs.

Distributing DLLs Inside an EXE (C++)

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.

There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).

There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.

Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

How to create a virtual file?

I'd like to simulate a file without writing it on disk. I have a file at the end of my executable and I would like to give its path to a dll. Of course since it doesn't have a real path, I have to fake it.
I first tried using named pipes under Windows to do it. That would allow for a path like \\.\pipe\mymemoryfile but I can't make it works, and I'm not sure the dll would support a path like this.
Second, I found CreateFileMapping and GetMappedFileName. Can they be used to simulate a file in a fragment of another ? I'm not sure this is what this API does.
What I'm trying to do seems similar to boxedapp. Any ideas about how they do it ? I suppose it's something like API interception (Like Detour ), but that would be a lot of work. Is there another way to do it ?
Why ? I'm interested in this specific solution because I'd like to hide the data and for the benefit of distributing only one file but also for geeky reasons of making it works that way ;)
I agree that copying data to a temporary file would work and be a much easier solution.

Use BoxedApp and do not worry.

You can store the data in an NTFS stream. That way you can get a real path pointing to your data that you can give to your dll in the form of
x:\myfile.exe:mystreamname
This works precisely like a normal file, however it only works if the file system used is NTFS. This is standard under Windows nowadays, but is of course not an option if you want to support older systems or would like to be able to run this from a usb-stick or similar. Note that any streams present in a file will be lost if the file is sent as an attachment in mail or simply copied from a NTFS partition to a FAT32 partition.
I'd say that the most compatible way would be to write your data to an actual file, but you can of course do it one way on NTFS systems and another on FAT systems. I do recommend against it because of the added complexity. The appropriate way would be to distribute your files separately of course, but since you've indicated that you don't want this, you should in that case write it to a temporary file and give the dll the path to that file. Make sure you write the temporary file to the users' temp directory (you can find the path using GetTempPath in C/C++).
Your other option would be to write a filesystem filter driver, but that is a road that I strongly advise against. That sort of defeats the purpose of using a single file as well...
Also, in case you want only a single file for distribution, how about using a zip file or an installer?

Pipes are for communication between processes running concurrently. They don't store data for later access, and they don't have the same semantics as files (you can't seek or rewind a pipe, for instance).
If you're after file-like behaviour, your best bet will always be to use a file. Under Windows, you can pass FILE_ATTRIBUTE_TEMPORARY to CreateFile as a hint to the system to avoid flushing data to disk if there's sufficient memory.
If you're worried about the performance hit of writing to disk, the above should be sufficient to avoid the performance impact in most cases. (If the system is low enough on memory to force the file data out to disk, it's probably also swapping heavily anyway -- you've already got a performance problem.)
If you're trying to avoid writing to disk for some other reason, can you explain why? In general, it's quite hard to stop data from ever hitting the disk -- the user can always hibernate the machine, for instance.

Since you don't have control over the DLL you have to assume that the DLL expects an actual file. It probably at some point makes that assumption which is why named pipes are failing on you.
The simplest solution is to create a temporary file in the temp directory, write the data from your EXE to the temp file and then delete the temporary file.
Is there a reason you are embedding this "pseudo-file" at the end of your EXE instead of just distributing it with our application? You are obviously already distributing this third party DLL with your application so one more file doesn't seem like it is going to hurt you?
Another question, will this data be changing? That is are you expecting to write back data this "pseudo-file" in your EXE? I don't think that will work well. Standard users may not have write access to the EXE and that would probably drive anti-virus nuts.
And no CreateFileMapping and GetMappedFileName definitely won't work since they don't give you a file name that can be passed to CreateFile. If you could somehow get this DLL to accept a HANDLE then that would work.
And I wouldn't even bother with API interception. Just hand the DLL a path to an acutal file.

Reading your question made me think: if you can pretend an area of memory is a file and have kind of "virtual path" to it, then this would allow loading a DLL directly from memory which is what LoadLibrary forbids by design by asking for a path name. And this is why people write their own PE loader when they want to achieve that.
I would say you can't achieve what you want with file mapping: the purpose of file mapping is to treat a portion of a file as if it was physical memory, and you're wanting the reciprocal.
Using Detours implies that you would have to replicate everything the intercepted DLL function does except from obtaining data from a real file; hence it's not generic. Or, even more intricate, let's pretend the DLL uses fopen; then you provide your own fopen that detects a special pattern in the path and you mimmic the C runtime internals... Hmm is it really worth all the pain? :D

Please explain why you can't extract the data from your EXE and write it to a temporary file. Many applications do this -- it's the classic solution to this problem.
If you really must provide a "virtual file", the cleanest solution is probably a filesystem filter driver. "clean" doesn't mean "good" -- a filter is a fully documented and supported solution, so it's cleaner than API hooking, injection, etc. However, filesystem filters are not easy.
OSR Online is the best place to find Windows filesystem information. The NTFSD mailing list is where filesystem developers hang out.

How about using a some sort of RamDisk and writing the file to this disk? I have tried some ramdisks myself, though never found a good one, tell me if you are successful.

Well, if you need to have the virtual file allocated in your exe, you will need to create a vector, stream or char array big enough to hold all of the virtual data you want to write.
that is the only solution I can think of without doing any I/O to disk (even if you don't write to file).
If you need to keep a file like path syntax, just write a class that mimics that behaviour and instead of writing to a file write to your memory buffer. It's as simple as it gets. Remember KISS.
Cheers

Open the file called "NUL:" for writing. It's writable, but the data are silently discarded. Kinda like /dev/null of *nix fame.
You cannot memory-map it though. Memory-mapping implies read/write access, and NUL is write-only.

I'm guessing that this dll cant take a stream? Its almost to simple to ask BUT if it can you could just use that.

Have you tried using the \?\ prefix when using named pipes? Many APIs support using \?\ to pass the remainder of the path directly through without any parsing/modification.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85,lightweight).aspx

Why not just add it as a resource - http://msdn.microsoft.com/en-us/library/7k989cfy(VS.80).aspx - the same way you would add an icon.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js