Correct way to extract array data from binary? - c++

There is a classic way to embed resource files as a C language array into a binary file, so that we can store some external resource files such as .jpeg or .txt files into a binary.
For example, in the header file we can define an array:
const unsigned char xd_data[] = {
77,90,144,0,3,0,0,0,4,0,0,0,255,255,0,0,184,0,0,0,0,0,0,0,64,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,240,0,0,
0,14,31,186,14,0,180,9,205,33,184,1,76,205,33,84,104,105,115,32,112,114,
111,103,114,97,109,32,99,97,110,110,111,116,32,98,101,32,114,117,110,
32,105,110,32,68,79,83,32,109,111,100,101,46,13,13,10,36,0,0,0,0,0,0,
0,66,163,223,218,6,194,177,137,6,194,177,137,6,194,177,137,105,221,187,
137,13,194,177,137,133,222,191,137,3,194,177,137,105,221,181,137,4,194,
177,137,136,202,238,137,4,194,177,137,6,194,176,137,73,194,177,137,133,
202,236,137,13,194,177,137,48,228,187,137,11,194,177,137,193,196,183,
137,7,194,177,137,82,105,99,104,6,194,177,137,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,80,69,0,0,76,1,4,0,65,162,32,86,0,0,0,0,0,0,0,
0,224,0,47,1,11,1,6,0,0,100,0,0,0,74,0,0,0,0,0,0,228,113,0,0,0,16,0,0,
0,128,0,0,0,0,64,0,0,16,0,0,0,2,0,0,4,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,
224,0,0,0,4,0,0,0,0,0,0,2,0,0,0,0,0,16,0,0,16,0,0,0,0,16,0,0,16,0,0,0,
0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,124,140,0,0,140,0,0,0,0,208,0,0,0,16,0
};
which contains the contents of the resource file, and it will be compile into the final binary.
There are lots of tools and tutorials on the web about this old trick, such as: http://www.rowleydownload.co.uk/arm/documentation/index.htm?http://www.rowleydownload.co.uk/arm/documentation/embed.htm, https://www.fourmilab.ch/xd/ and http://gareus.org/wiki/embedding_resources_in_executables#c_include_method.
However, looks like most of these pages are talking about how to embed the data into binary file using C style array.
My question is, what is the correct way to find the start address of the resource files in the compiled binary in order to extract them? I.e., how can I find the start address of xd_data in the compiled binary?

If you mean finding the byte address in the file where a data block starts just like objdump does but programmatically, then you can use the Binary File Descriptor library (BFD), see here and here.

if you stored data for example an image and you want to load it (for printing or what ever you want) then if you have a function (library) that load it from memory, as example void loadResImage(void * mem); just do loadResImage(xd_data), if not but you have a function that load it from the file, in that case save it to a temp file eg:
int fd=open("tmpfile");
int ret=write(fd,xd_data, sizeof(xd_data));
close(fd);
loadImageFile("tmpfile");
but if you want to access the data outside the program itself (hex editor for example, or an other program), in that case you have to add a starting mark and optionally an ending mark or sizeof data. eg:
const unsigned char xd_data[]={
...
'M','A','G','I','C'};
in example above the end of the data is known, you just do a search to find it. same way, play around and find a suitable way to store the size of the data. but beware of the compiler optimizations.

Related

Is it possible to store binary files inside an exe

Is it possible to do this: (for educational purpose).
suppose I have a image file "image.jpg"
I want to create a program when it executes it should create this image. That means the data of the image is stored in the exe. Is this possible to do?
Something like this: link the image file from resource.rc then tell the compiler to get the data and store it (something like this unsigned char data_buffer[]="binary data of the image" then when the program is executed I can write this data to a file)
(I'm using C++ with mingw compiler)
Any help is highly appreciated.
There are several options:
1) Add it as a byte array in a source file. It is trivial to write an auxiliary program that reads the bytes from the files and writes the C source. E.g.:
data_jpg.c:
unsigned char data_jpg[] = {1,2,3... };
data_jpg.h:
extern char data_jpg[];
const size_t data_jpg_size = 1000;
2) Add it as a binary resource to the executable. You said "exe", did you? So you are likely on Windows. Window EXE files can have binary resources, that can be located using the resource API. See the FindResource, LoadResource and GlobalLock, functions.
resource.rc
ID_DATA_JPG FILE "data.jpg"
3) Convert the binary file directly into a OBJ file and link it into the executable. In the old good days of turbo-c used to be a BINOBJ tool for that. And GNU tools can do it, AFAIk, but with MS tools, I really cannot tell.
With a PE file, you can add data(include bin data) to the PE file's tail as your resource. You just remember the PE file's size. But I'm not sure of that whether you need change the PE's checksum. And use VC++ Compiler to embed resources would be pretty much easy.

Accessing text data files from a static library function

How to I enable a static library to pull in data available in ascii data files?
I am trying to add a model to a simulation as a library which contains functions that read data from data files. I am able to compile and run the functions from a main program outside the actual full simulation, but once I put the functions as a library on the host for the simulation the data no longer gets read.
As the path to the data is changing depending on the user, I cannot provide an absolute data path to the ascii data files. Is there a way to use objcopy to make the data files into object code in the library or how can I best access the data from my static library?
There are several solutions to open a file that has an unknown location at compile time. Prompt the user for the name of the file, including directory. Use an environment variable to designate the directory containing the file ... Fortran 2003 has an intrinsic to obtain the value of an environment variable. Obtain the information from a command line argument ... again Fortran 2003 has an intrinsic for this purpose. With all of these, construct the filename as a string variable and provide that variable to the FILE keyword of the OPEN statement.
I don't know why you inclouded the Fortran tag, but in Fortran you:
tell the code to open a file you want using a character string
to read from it
and to close it
There is no difference between a main program or a library.
If you have a function like, say:
void read_data_from_files() { ... }
You'll need to change it in the DLL to be more like:
DataObject read_data_from_file(const char* file_path) { ... }
And then call it appropriately.
You'll need to design DataObject.

Encoding/patching variable in other .exe

I'm out of ideas how to do this :
You have one file, let's call it test.exe,
it has const int value = 5; in it, and all it does is cout << value;
I want to create other executable which patches the test.exe so it now outputs 10 instead of 5. I want this to be done before runtime.
I've tried turning off the ASLR, getting the address of that variable and then patching in, but addresses in disk and in memory differ AFAIK.
Sorry, this remark assumes you are working on a Windows System. If not, I'm sure that with other executable image formats you can follow similar method.
Assuming you are trying to ask how you alter data within a target and not how to, in this particular example, change the screens output...
Have you considered looking at the executable image's PE Header? You can translate the address of a particular piece of data once loaded into memory to its offset in the PE file but taking a look at the IMAGE_SECTION_HEADER structure inside of PE Header of the image in question.
First, calculate the RVA of your data in memory. This is the address of the data relative to the section it is located inside of.
Second, index through the IMAGE_SECTION_HEADER structures inside of the executable's PE header by reading the header from file into a buffer. Once you've loaded this header into a memory buffer, you can process it using pointers. Like so,
IMAGE_NT_HEADERS* pImageHeader = &peHeaderBuffer[0];
After finding the correct IMAGE_SECTION_HEADER that contains your data,you can access the PointerToRawData member of the structure which will give you the offset from the start of the PE file at which this section is, if you add the RVA, you will get the offset from the start of the file from which your data is located.
Obviously, my response doesn't explain how to index through the section headers as this is a fairly tedious task that would take a while to explain. I would suggest you take a look at an exectuable's PE header from within a simple debugger, like OllyDbg, and reference MSDN's documentations on the PE Header - which can be found here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680336%28v=VS.85%29.aspx
If all you want to do is reverse this information our of a target, it is very easy to do using OllyDbg. Just skim down the PE Header view until you see the section that corresponds to your data, and OllyDbg will list the PointerToRawData member there, which you can add to your RVA.
Find it by signature: get 8-16 bytes around your value 5 and then search for them in .exe binary.
Also note that usually const int values are inlined into the assembler code, so if you have 2 or more statements referencing to it you have to patch all of them.

embedding a text file in an exe which can be accessed using fopen

I would like to embed a text file with some data into my program.
let's call it "data.txt".
This text file is usually loaded with a function which requires the text file's file name as input and is eventually opened using a fopen() call... some something to the lines of
FILE* name = fopen("data.txt");
I can't really change this function and I would like the routine to open this same file every time it runs. I've seen people ask about embedding the file as a header but it seems that I wouldn't be able to call fopen() on a file that I embed into the header.
So my question is: is there a way to embed a text file as a callable file/variable to fopen()?
I am using VS2008.
Yes and No. The easiest way is to transform the content of the text file into an initialized array.
char data_txt[] = {
'd','a','t','a',' ','g','o','e','s',' ','h','e','r','e', //....
};
This transformation is easily done with a small perl script or even a small C program. You then compile and link the resulting module into your program.
An old trick to make this easier to manage with a Makefile is to make the script transform its data into the body of the initializer and write it to a file without the surrounding variable declaration or even the curly braces. If data.txt is transformed to data.inc, then it is used like so:
char data_txt[] = {
#include "data.inc"
};
Update
On many platforms, it is possible to append arbitrary data to the executable file itself. The trick then is to find it at run time. On platforms where this is possible, there will be file header information for the executable that indicates the length of the executable image. That can be used to compute an offset to use with fseek() after you have opened the executable file for reading. That is harder to do in a portable way, since it may not even be possible to learn the actual file name of your executable image at run time in a portable way. (Hint, argv[0] is not required to point to the actual program.)
If you cannot avoid the call to fopen(), then you can still use this trick to keep a copy of the content of data.txt, and put it back in a file at run time. You could even be clever and only write the file if it is missing....
If you can drop the call to fopen() but still need a FILE * pointing at the data, then this is likely possible if you are willing to play fast and loose with your C runtime library's implementation of stdio. In the GNU version of libc, functions like sprintf() and sscanf() are actually implemented by creating a "real enough" FILE * that can be passed to a common implementation (vfprintf() and vfscanf(), IIRC). That faked FILE is marked as buffered, and points its buffer to the users's buffer. Some magic is used to make sure the rest of stdio doesn't do anything stupid.
For any kind of file, base on RBerteig anwser you could do something simple as this with python:
This program will generate a text.txt.c file that can be compiled and linked to your code, to embed any text or binary file directly to your exe and read it directly from a variable:
import struct; # Needed to convert string to byte
f = open("text.txt","rb") # Open the file in read binary mode
s = "unsigned char text_txt_data[] = {"
b = f.read(1) # Read one byte from the stream
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
while b != "":
s = s + "," # Add a coma to separate the array
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
s = s + "};" # Close the bracktes
f.close() # Close the file
# Write the resultan code to a file that can be compiled
fw = open("text.txt.c","w");
fw.write(s);
fw.close();
Will generate something like
unsigned char text_txt_data[] = {0x52,0x61,0x6e,0x64,0x6f,0x6d,0x20,0x6e,0x75...
You can latter use your data in another c file using the variable with a code like this:
extern unsigned char text_txt_data[];
Right now I cant think of two ways to converting it to readable text. Using memory streams or converting it to a c-string.

Embedding a file into a program

I want to embed a file in a program. It will be used for default configuration files if none are provided. I have realized I could just use default values, but I want to extract the file and place it on the disk so it can be modified.
By embedding do you mean distributing your program without the file?
Then you can convert it to configuration initialization code in your build toolchain. Add a makefile step (or whatever tool you're using) - a script that converts this .cfg file into some C++ code file that initializes a configuration data structure. That way you can just modify the .cfg file, rebuild the project, and have the new values reflected inside.
By the way, on Windows, you may have luck embedding your data in a resource file.
One common thing you can do is to represent the file data as an array of static bytes:
// In a header file:
extern const char file_data[];
extern const size_t file_data_size;
// In a source file:
const char file_data[] = {0x41, 0x42, ... }; // etc.
const size_t file_data_size = sizeof(file_data);
Then the file data will just be a global array of bytes compiled into your executable that you can reference anywhere. You'll have to either rewrite your file processing code to be able to handle a raw byte array, or use something like fmemopen(3) to open a pseudo-file handle from the data and pass that on to your file handling code.
Of course, to get the data into this form, you'll need to use some sort of preprocessing step to convert the file into a byte array that the compiler can accept. A makefile would be good for this.
Embedded data are often called "resources". C++ provides no native support, but it can be managed in almost all executable file formats. Try searching for resource managers for c++.
If it's any Unix look into mapping the file into process memory with mmap(2). Windows has something similar but I never played with it.