Writing hex data to an executable not working? C++ - c++

I have been trying to figure out how installers work and how they bundle everything into one executable file and how to create my own. I have tried using a hex editor called HxD which allows you to export the current hex-dump of a file into a .c source file with an array containing the hex dump that looks like the below.
Excited, I tried to write the file using some simple C++ code:
ofstream newbin("test.exe", ios::binary);
newbin << hex << rawData;
newbin.close();
... and then tried to run it.
After some further research it turns out that my little program is only writing the MZ. header which PE files use in windows and excluding the rest of the code. The executable that is created has a hex-dump of 4D 5A 90 or in ASCII MZ.. Is this a fault in my coding? Why won't it write the hex data? Would I need to use some lower-level writing tool or assembly? If so, are there any C/C++ libraries that allow me to write at such a level? Thanks!

rawData is a char* and is interpreted as a character string by the streaming operator, which is terminated by the first 0x00 byte it encounters.
For binary writing, you are best off using the
ostream& write(const char*, int);
method, leading to
newbin.write(rawData, 65536);
Assuming 65536 is the actual used size of the buffer.
Hope this helps :)

A better approach to storing binary data is to use resources. Menus, icons, bitmaps are stored in resources.
You can create a custom resource and use FindResource function, LoadResource, and then LockResource to map it into memory.
Then you can do whatever you want with the data, and of course write it to a file.
Installers usually use something like this rather than embedding lots binary data in the source code. This approach has other advantages:
You don't have to re-convert your data into source code and then recomplile the whole application when the data changes. Only resources have to be recompiled and re-linked.
The resources are not loaded into memory until use the functions above, what means until you need them. Thus the application loads faster into the memory. (Resource data are actually mapped into address space right from the file.)
With your current approach, all the data are loaded into memory, therefore your application requires more memory.
Additionally, you should better use specialized tools for creating installers.

Related

Is it possible to modify an executable file on runtime?

Is it possible to modify an executable file on runtime (I'm asking about Windows XP/Vista/7/Server)? I've just evaluated SmartUtils Portable Storage application. It can create so called "managed executable storage files" that modify them-self at runtime... Such storage file is like standard self-extracting archive (the data is apended to an executable module) but the main difference it that you are able to view and modify its content without the main program. How is it possible? I need similar functionality in my project (C++): I want to be able to create executable that can modify data attached to it.
If all you're really asking is how SmartUtils Portable Storage does it's magic, then I would suggest that it is a self-executing zip archive. The EXE of the archive (just as WinZip or 7-Zip create) auto-extracts and executes your application exe from a temp folder, and gives you an API that boils down to ways to extract, manipulate, and then modify that original self-executing archive.
So Windows is never trying to modify a running .exe. Rather, your .exe (temp file extracted & run) is what is executing (and the libraries bound to it), which manipulates the source .exe (really a self-executing archive - possibly .zip).
The next time the user "runs" the modified "exe", again your .exe is extracted & run, and it can again manipulate the self-extracting .exe.
I hope that makes sense to you.
And this is just a best guess!
Yes - a common technique is to append data files at the end of an executable.
Typical scheme is to write a 0x00000000 integer to the end of the executable and then append each file followed by it's size in bytes.
Then when the executable needs to read the data it checks the last 4bytes in it's own file, uses that as the file length and copies that number of bytes form it's own file, it then checks the next 4 bytes as another length and copies that as a file , until it gets a length of 0000. If you also need to code the file names - that adds a little complexity but it's basically the same idea.
You can append a TOC pointer to an EXE (and probably a magic ID cookie) so you can verify that it is a TOC pointer, and then use that to back up to the start of each appended record.
As long as you don't mess up the file's header & main contents, it should still be loadable by the OS.
However, you sacrifice any signing your EXE had - and you probably have various permissions issues to contend with...
I have written tools for my development environment that opens a Windows EXE, extrapolates the resources in it, modifies various ones, and repackages the whole thing. We use this to mark a beta as release (so it modifies the version records).
You can do anything you want to an EXE file if you know the structure of it and rebuild it correctly.
Since this is tagged as Windows, you might also consider "Alternate Data Streams". That allows you to treat a single file almost as a directory. You can add a stream called Program.EXE:ExtraData to your program and write to that with the normal file functions.
Then again, your executable most likely will be in Program Files\, which isn't writeable for normal (non-elevated) users.

Versioning executable and modifying it in runtime

What I'm trying to do is to sign my compiled executable's first 32 bytes with a version signature, say "1.2.0" and I need to modify this signature in runtime, keeping in mind that:
this will be done by the executable itself
the executable resides on the client side, meaning no recompilation is possible
using an external file to track the version instead of encoding it in the binary itself is also not an option
the solution has to be platform-independent; I'm aware that Windows/VC allows you to version an executable using a .rc resource, but I'm unaware of an equivalent for Mac (maybe Info.plist?) and Linux
The solution in my head was to write the version signature in the first or last 32 bytes of the binary (which I didn't figure out how to do yet) and then I'll modify those bytes when I need to. Sadly it's not that simple as I'm trying to modify the same binary that I'm executing.
If you know of how I can do this, or of a cleaner/mainstream solution for this problem, I'd be very grateful. FWIW, the application is a patcher/launcher for a game; I chose to encode the version in the patcher itself instead of the game executable as I'd like it to be self-contained and target-independent.
Update: from your helpful answers and comments, I see that messing with the header/footer of the binary is not the way to go. But regarding the write permission for the running users, the game has to be patched one way or another and the game files need to be modified, there's no way to circumvent that: to update the game, you'll need admin privileges.
I would opt for using an external file to hold the signature, and modify that with every update, but I can't see how I can guard against the user spoofing with that file: if they mess up the version numbers, how can I detect which version I'm running?
Update2: Thanks for all your answers and comments, in truth there are 2 ways to do this: either use an external resource to track the version or embed it in the main application's binary itself. I could choose only 1 answer on SO so I did the one I'm going with, although it's not the only one. :-)
Modern Windows versions will not allow you to update an installed program file unless you're running with administrator privileges. I believe all versions of Windows block modifications to a running file altogether; this is why you're forced to reboot after an update. I think you're asking for the impossible.
This is going to be a bit of a challenge, for a number of reasons. First, writing to the first N bytes of the binary is likely to step on the binary file's header information, which is used by the program loader to determine where the code & data segments, etc. are located within the file. This will be different on different platforms (see the ELF format and executable format comparison)--there are a lot of different binary format standards.
Assuming you can overcome that one, you're likely to run afoul of security/antivirus systems if you start modifying a program's code at runtime. I don't believe most current operating systems will allow you to overwrite a currently-running executable. At the very least, they might allow you to do so with elevated permissions--not likely to be present while gaming.
If your application is meant to patch a game, why not embed the version in there while you're at it? You can use a string like #Juliano shows and modify that from the patcher while the game is not running - which should be the case if you're currently patching anyways. :P
Edit: If you're working with Visual Studio, it's really easy to embed such a string in the executable with a #pragma comment, according to this MSDN page:
#pragma comment(user, "Version: 1.4.1")
Since the second argument is a simple string literal, it can be concatenated, and I'd have the version in a simple #define:
// somehwere
#define MY_EXE_VERSION "1.4.1"
// somewhere else
#pragma comment(user, "Version: " MY_EXE_VERSION)
I'll give just some ideas on how to do this.
I think it's not possible to change some arbitrary bytes in the executable without side effects. To overcome this, I would create some string in your source code, like:
char *Version = "Version: AA.BB.CC";
I don't know if this is a rule, but you can look for this string in your binary code (open it in a text editor and you will see). So, you search and change this bytes for your version number in the binary file. Probably, their position will vary each time you compile the application, so this it is possible only if that location is not a problem for you.
Because the file is being used (it's running), you have to launch an external program that would do this. After modifying the file, this external program could relaunch the original application.
The version will be stored in your binary code in some part. Is that useful? How will you retrieve the version number?

how to create self-extracting_archive ( programmatically )

So, how to do it?
How to pack files to self-extracting_archive. What is algorithm?
You can create self-extracting archives for windows with 7-zip, if you want to create them programmatically you can use the SDK.
If you're more interested in ways to implement this yourself: you could have a statically linked application which has the compressed data linked into the executable (as a resource, for instance - for smaller archives a plain static const char data[] array might be sufficient). At runtime, you feed the data to a decompression library which then actually extracts files.
To keep the overhead of the executable small, I'd try to use system API (e.g. plain WIndows controls on Windows) a possible so that you don't have to link in a toolkit. Also, for the decompression, I would use bzip2 since it provides a good compromise between compression size and decompression speed. You might also want to look at minilzo since it has a smaller code footprint than bzip2 (so the executable file is smaller) and a much higher decompression speed - it doesn't compress as well though.
A self extracting archive is just some extractor program, but instead of taking it's data from an archive file it takes it from constants defined in the program itself. That is really something very simple at conceptual/algorithmic level.
If you don't care about size you can have something as simple as below (exemple in python to keep it simple, an actual unarchiver will probably be a compiled program from C or C++ source):
hello_prog = """print "Hello, World"\n""";
f = file("./hello.py", "w");
f.write(hello_prog);
f.close();
when you run it it creates a file hello.py that is also a python executable.
But when actually creating an auto-extracting archive, you usually want the internal data to be compressed to make the whole archive as small as possible. You also want to keep the extractor program as small as possible and also as independant as possible of what is already available on the target system... and that's where the problems really begin.

Design: Large archive file editor, file mapping

I'm writing an editor for large archive files (see below) of 4GB+, in native&managed C++.
For accessing the files, I'm using file mapping (see below) like any sane person. This is absolutely great for reading data, but a problem arises in actually editing the archive.
File mapping does not allow resizing a file while it's being accessed, so I don't know how I should proceed when the user wants to insert new data in the file (which would exceed the file's original size, when it was mapped.)
Should I remap the whole thing every time? That's bound to be slow. However, I'd want to keep the editor real-time with exclusive file access, since that simplifies the programming a lot, and won't let the file get screwed by other applications while being modified. I wouldn't want to spend an eternity working on the editor; It's just a simple dev-tool for the actual project I'm working on.
So I'd like to hear how you've handled similar cases, and what other archiving software and especially other games do to solve this?
To clarify:
This is not a text file, I'm writing a specific binary archive file format. By which I mean a big file that contains many others, in directories. Custom archive files are very common in game usage for a number of reasons. With my format, I'm aiming to a similar (but somewhat simpler) structure as with Valve Software's GCF format - I would have used the GCF format as it is, but unfortunately no editor exists for the format, although there are many great implementations for reading them, like HLLib.
Accessing the file must be fast, as it is intended for storing game resources. So it's not a database. Database files would be contained inside it, along with GFX, SFX etc. files.
"File mapping" as talked here is a specific technique on the Windows platform, which allows direct access to a large file through creating "views" to parts of it, see here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx - This technique allows minimal latency and memory usage and is a no-brainer for accessing any large files.
So this does not mean reading the whole 4GB file into memory, it's exactly the contrary.
What do you mean by 'editor software'? If this is a text file, have you tried existing production-quality editors, before writing your own? If it's a file storing binary data, have you considered using an RDBMS and manipulating its contents using SQL statements?
If you absolutely have to write this from scratch, I'm not sure that mmapping is the way to go. Mmapping a huge file will put a lot of pressure on your machine's VM system, and unless there are many editing operations all over the file its efficiency may lag behind a simple read/write scheme. Worse, as you say, you have problems when you want to extend the file.
Instead, maintain buffer windows to the file's data, which the user can modify. When the user decides to save the file, traverse sequentially the file and the edited buffers to create the new file image. If you have disk space it's easier to write a new file (especially if a buffer's size has changed), otherwise you need to be clever on how you read-ahead existing data, before you overwrite it with the new contents.
Alternatively, you can keep a journal of editing operations. When the user decides to save the file, perform a topological sort on the journal and play it on the existing file to create the new one.
For exclusive file access use the file locking of your operating system or implement application-level locking (if only your editor will touch these files). Depending on mmap for exclusive access constrains your implementation choices.
Mapping the file is create for actually accessing the data, but I think you need another abstraction that represents the structure of the file. There are various ways of doing this, but consider representing the file as a sequence of 'extents'.
To start with the file is a single extent that is equivalent to the whole mapping. If the user then starts to edit the file, you would split the single extent into two at the edit point, and insert a new extent that contains the data the user has inserted. Modifications and deletes would also modify your view of the file by creating or modifying these extents.
Maybe you could examine the source code for one of the open source editors -- there are lots to choose from, but finding one that is simple enough would be the challenge.
What I do is to close view handle(s) and FileMapping handle, set the file size then reopen mapping / view handles.
// Open memory mapped file
HANDLE FileHandle = ::CreateFileW(file_name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
size_t Size = ::GetFileSize(FileHandle, 0);
HANDLE MappingHandle = ::CreateFileMapping(FileHandle, NULL, PAGE_READWRITE, 0, Size, NULL);
void* ViewHandle = ::MapViewOfFile(MappingHandle, FILE_MAP_ALL_ACCESS, 0, 0, Size);
...
// increase size of file
UnmapViewOfFile(ViewHandle);
CloseHandle(MappingHandle);
Size += 1024;
LARGE_INTEGER offset;
offset.QuadPart = Size;
LARGE_INTEGER newpos;
SetFilePointerEx(FileHandle, offset, &newpos, FILE_BEGIN);
SetEndOfFile(FileHandle);
MappingHandle = ::CreateFileMapping(FileHandle, NULL, PAGE_READWRITE, 0, Size, NULL);
ViewHandle = ::MapViewOfFile(MappingHandle, FILE_MAP_ALL_ACCESS, 0, 0, Size);
The above code has no error checking and does not handle 64bit sizes, but that's not hard to fix.
There's no easy answer for this problem -- I've looked for one for a long time, in vain. You'll have to modify the file's size, then re-map it.
Mapping has a basic issue with file on remote system.
In good old DOS days, there a was a fine editor called Norton Editor ( ne.com .. this the
filename, not web site ). It can load file of any size ( we are talking of 640kb RAM
and 20 GB hard disks, if any ).
It used to load only part of file, cleverly managing file-long searches with on demand
loading
IMHO, such an approach should be used.
If properly hidden under a file-read-write layer , it can be surprisingly transparent.
I'd build the large file from pieces at build-time. You have your editor deal with normal, flat files, in the usual file system (with subdirectories, etc., as appropriate). You then have a compile step that gathers all of these pieces together into your archive file format.

Combining two executables

I have a command line executable that alters some bits in a file that i want to use from my program.
Is it possible to create my own executable that uses this tool and distribute only one executable?
[edit] Clarification:
The command line tool takes an offset and some bits and changes the bits at this offset in a given file. So I want to create a patcher for an application that changes specific bits to a specific value, so what I can do i write something like a batch file to do it but i want to create an executable that does it, i.e. embed the tool into a wrapper program that calls it with specific values.
I can code wrapper in (windows) c\c++, asm but no .net please.
It would be easier to roll your own implementation of this program than to write the wrapper; it sounds like it is trivial -- just open the file, seek to the right location, write your bits, close the file, you're done.
The easiest way is to embed this exe into your own and write it to disk to run it.
You can add the executable as a binary stream resource in your executable and when you need it you can extract it in a temporary folder and create new process with the temporary file.
The exact code you need to do this depends on whether you are writing .Net or C++ code.
Short answer: No.
Less short answer: Not unless it's an installer or a self extracting archive executeable.
Longer, speculative answer: If the file system supports alternate data streams, you could possibly add a stream containing the utility to your program, then your program could access it's own alternate data stream, extracting the utility when you need it. Ahaha.
You could append the one executable onto the end of the other and write some code to unpack it to a temporary folder.
I've done a similar thing before but with a configuration file and some bitmaps appended to an EXE in Windows. The way I did it was to firstly append my stuff onto the end of the EXE and then write a little struct after that which contains the file offset of the data which in your case would be the offset of the 2nd exe.
When running your app, seek to the end of the file minus the size of the struct, extract the file offset and copy the 2nd exe to a temporary folder, then launch it.
OK, here is a little more details as requestd. This is some pseudo-code to create the combined EXE. This is a little utility you run after compiling your main EXE:
Open destination file
Open main exe as a binary file
Copy main exe to destination file
offset = size of main exe
Open 2nd exe as a binary file
Copy 2nd exe to the output file
Write the offset to the output file
Now for the extraction procedure. This goes in your main EXE:
Find the location of our own EXE file (GetModuleFileName() under Windows)
Open the file in binary mode
Seek to the end minus sizeof(offset) (typically 4 bytes)
Read the offset value
Seek to the offset position
Open a temporary file in binary mode
Read bytes from the main EXE and write to the temporary file
Launch the temporary file
I think the easiest way to do this for your purposes is probably to use a self extracting executable package. For example, use a tool like Paquet Builder which will package the exe (and any other files you want) and can be configured to call the exe or a batch file or whatever else you want when the user unpacks the self-extracting executable.
If the exe was built to be relocatable (essentiall linker flag /fixed:no), you can actually do a LoadLibrary on it, get the base address, set up a call chain and call (jump) into it. It would not be worth the effort, and very few exe's are built this way so you would have to have the code to rebuild it, at which point you wouldn't be in this exercise.
So... No.
I'm more intrigued by the developer who doesn't mind writing in C/C++/asm, but 'not .net' - but is apparently stymied by fopen/fseek/fwrite - since that's about all the program you describe sounds like it's doing.
I think this is also possible by using AutoIt's FileInstall function. For this you'll have to setup AutoIt, create a script with the FileInstall function to include the who exe's and then use f.i. the function RunWait to execute them. Compile to an exe and you should be done.