how to remove some data from pe (exe) file in C - c++

in first exe i have defined array of char with some special bytes as label, i mapping it to memory from another exe, finding needed label and putting in it new data, but this data could be shorter then defined array, so i want to cut this array to needed size! how can i do it?

There is no fine and simple way to cut out pieces of PE file.
Obvious solution is to additionally define a length field in the original (in your terms first) exe and mark it with another label. Then additional work of second exe would be to write to this field actual data length.
EDIT: If cutting is your primary goal you must also keep in mind that:
Control sum of a PE will change. Location of control sum field in PE header is not hard to find though.
PE file is aligned. All sections are aligned. The alignment could be found in the header too.
If you cut one section it would cause great consequences. Take a look at PE file header structure.
Reference:
http://msdn.microsoft.com/en-us/library/ms809762.aspx

Related

One large file, or several small files?

I'm writing 3D model data out to file, while includes a lot of different types of information (meshes, textures, animation, etc) and would be about 50 to 100 mb in size.
I want to put all this in a single file, but I'm afraid it will cost me if I need to read only a small portion of that file to get what I want.
Should I be using multiple smaller files for this, or is a single very large file okay? I don't know how the filesystem treats trying to jump around giant files, so for all I know iterating through a large file may either be costly, or no problem at all.
Also, is there anything special I must do if using a single large file?
There is no issue with accessing data in the middle of a file - the operating system won't need to read the entire file, it can skip to any point easily. Where the complexity comes in is you'll need to provide an index that can be read to identify where the various pieces of data are.
For example, if you want to read a particular animation, you'll need a way to tell your program where this data is in the file. One way would be to store an index structure at the beginning of the file, which your program would read to find out where all of the pieces of data are. It could then look up the animation in this index, discover that it's at position 24680 and is 2048 bytes long, and it could then seek to this position to read the data.
You might want to look up the fseek call if you're not familiar with seeking within a file: http://www.cplusplus.com/reference/cstdio/fseek/

How to check the corruption of a file?

I'm developping a blackberry 10 mobile application using the momentics IDE (native sdk).
In my code, I want to add a function that should check if a file is corrupted or not .
How should I do proceed ?
Two methods I can think of: -
1) If you're writing out the file, ensure you have a specific set of bytes that you write out at the end. When reading the file in, move to the end of the file and check if those bytes are present. If not, the file didn't finish writing and can be considered corrupt. Alternatively to this is to write out the size of bytes to the beginning of the file and check to see if the rest of the file size is equal to that number when it is read back in.
2) If you're checking a file that doesn't change, store a hash of the contents of the file and at run-time, generate the hash and compare it to the one you've stored. If they differ, the file has been modified and you can consider it corrupt.
What do you mean by corrupted?
If you want to see if the file is what you expect or not, you can calculate a hash of the file, like SHA-256 or whatever hashing algorithm you like, store the hash values, and then in your application simply calculate the hash values of the files, compare to what you expect, and if they're the same, there is most probably no corruption.
You might want to have a look here.

Regarding memory mapped files and usage in large file text editor

I am currently working on a text editor that ideally should be able to handle very large files(theoretically 16 eb). I am planning to use memory mapped files for the file management part. I read some interesting examples in the book Windows via C/C++. My questions here are:
Is it essential that the file offsets from which I need to map should be on 64k(or whatever the allocation granularity size) boundaries?
My second question is that if yes(to the first question), then would it be viable to map 2 64k views in order to keep a continuous flow of text, when I encounter a situation when I require the contents of the file from either sides of the 64k boundary? for example,
lets say that the user scolls across to a point in the file where I have data represented in (64k - 1) of the file and this point lies in the middle of the screen of my text editor, such that i need to display data that ranges from, let's say, (64k - x) to (64k + x). So i could make two mappings, 0 - 64k and 64k - 64k(i could form a smaller mapping but then i would require to resize the mapping later in any case, to 64k).
I wasn't quite sure how to frame the questions, so if you don't understand what I meant, I'll keep updating the questions according to the responses I get.
According to the documentation for MapViewOfFile, dwFileOffsetLow is:
A low-order DWORD of the file offset where the view is to begin. The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.
So the answer to your first question is yes.
The answer to your second question also is yes. You can create multiple views of the same file.
The article Managing Memory Mapped Files may be of some use to you.
By the way, if you get your text editor to the point where it can be tested, I'd be quite interested in seeing it. I have long despaired at finding an editor or text file viewer that gracefully handles very large files. See Large Text File Viewers and Designing a better text file viewer for some thoughts.

Keep .exe timestamp from changing

Does anybody know of a way to prevent the timestamp of an executable from changing? I'm trying to generate a consistent hash code for the .exe but I think the timestamp may be preventing that from happening. Each time I recompile the code (VS C++) the FastSum generates a different checksum.
Thanks!
The PE file format (as in your EXE) has a timestamp field. Check out "Table 2. IMAGE_FILE_HEADER Fields" at this link: http://msdn.microsoft.com/en-us/library/ms809762.aspx
It seems like if you really wanted to, you could edit TimeDateStamp in a hex editor, or write a small program to do it for you. If I read the above document correctly, it looks like it's 4 bytes at offset 10.
I'm not sure what the consequences are of changing this. My guess is it may make you unable to find symbols when you debug the program. Maybe instead of changing this field in your binary you should hash regions outside the PE header. (The link I provide may help you determine where that would make sense.)
Depending on what you have to checksum, you can either strip off the COFF header (where the timestamp resides) or the Optional Header. In the latter case, you just only save the section table and section data (the binary content of the executable). If you make sure your source code is not changed and compile and link flags are not changed, the section data should remain the same. If you want to include version numbers or size of code in the checksum, you must include the Optional Header.
To find the start of Optional Header, follow the procedure:
Read 4-byte signature base address from 0x3c.
Goto the signature offset.
Offset 20 bytes. This is the start of the Optional Header.
You should expect 0x10b here if it is 32-bit exe file or 0x20b if 64-bit.
To find the start of section table, follow the procedure:
Read 4-byte signature base address from 0x3c.
Goto the signature offset.
offset 16 bytes.
Read 2-byte Optional Header size here.
Goto the Optional Header.
Offset Optional Header size bytes. This is the start of the section table.
You should expect a section name here (like ".text", ".data", etc).
For complete specification of PE & COFF format, download this: Microsoft PE and COFF Specification.
Which timestamp? Last accessed? You can't prevent that changing if you are accessing it - however you could take note of it and then change it back?
For a hash - what do you mean? A method of ensuring that the .exe hasn't changed? I'd use a CRC.
File timestamps are something controlled and maintained by the OS - they're not internal to the file (including executables) itself.

Writing to the middle of the file (without overwriting data)

In windows is it possible through an API to write to the middle of a file without overwriting any data and without having to rewrite everything after that?
If it's possible then I believe it will obviously fragment the file; how many times can I do it before it becomes a serious problem?
If it's not possible what approach/workaround is usually taken? Re-writing everything after the insertion point becomes prohibitive really quickly with big (ie, gigabytes) files.
Note: I can't avoid having to write to the middle. Think of the application as a text editor for huge files where the user types stuff and then saves. I also can't split the files in several smaller ones.
I'm unaware of any way to do this if the interim result you need is a flat file that can be used by other applications other than the editor. If you want a flat file to be produced, you will have to update it from the change point to the end of file, since it's really just a sequential file.
But the italics are there for good reason. If you can control the file format, you have some options. Some versions of MS Word had a quick-save feature where they didn't rewrite the entire document, rather they appended a delta record to the end of the file. Then, when re-reading the file, it applied all the deltas in order so that what you ended up with was the right file. This obviously won't work if the saved file has to be usable immediately to another application that doesn't understand the file format.
What I'm proposing there is to not store the file as text. Use an intermediate form that you can efficiently edit and save, then have a step which converts that to a usable text file infrequently (e.g., on editor exit). That way, the user can save as much as they want but the time-expensive operation won't have as much of an impact.
Beyond that, there are some other possibilities.
Memory-mapping (rather than loading) the file may provide efficiences which would speed things up. You'd probably still have to rewrite to the end of the file but it would be happening at a lower level in the OS.
If the primary reason you want fast save is to start letting the user keep working (rather than having the file available to another application), you could farm the save operation out to a separate thread and return control to the user immediately. Then you would need synchronisation between the two threads to prevent the user modifying data yet to be saved to disk.
The realistic answer is no. Your only real choices are to rewrite from the point of the modification, or build a more complex format that uses something like an index to tell how to arrange records into their intended order.
From a purely theoretical viewpoint, you could sort of do it under just the right circumstances. Using FAT (for example, but most other file systems have at least some degree of similarity) you could go in and directly manipulate the FAT. The FAT is basically a linked list of clusters that make up a file. You could modify that linked list to add a new cluster in the middle of a file, and then write your new data to that cluster you added.
Please note that I said purely theoretical. Doing this kind of manipulation under a complete unprotected system like MS-DOS would have been difficult but bordering on reasonable. With most newer systems, doing the modification at all would generally be pretty difficult. Most modern file systems are also (considerably) more complex than FAT, which would add further difficulty to the implementation. In theory it's still possible -- in fact, it's now thoroughly insane to even contemplate, where it was once almost reasonable.
I'm not sure about the format of your file but you could make it 'record' based.
Write your data in chunks and give each chunk an id.
Id could be data offset in file.
At the start of the file you could
have a header with a list of ids so
that you can read records in
order.
At the end of 'list of ids' you could point to another location in the file (and id/offset) that stores another list of ids
Something similar to filesystem.
To add new data you append them at the end and update index (add id to the list).
You have to figure out how to handle delete record and update.
If records are of the same size then to delete you can just mark it empty and next time reuse it with appropriate updates to index table.
Probably the most efficient way to do this (if you really want to do it) is to call ReadFileScatter() to read the chunks before and after the insertion point, insert the new data in the middle of the FILE_SEGMENT_ELEMENT[3] list, and call WriteFileGather(). Yes, this involves moving bytes on disk. But you leave the hard parts to the OS.
If using .NET 4 try a memory-mapped file if you have an editor-like application - might jsut be the ticket. Something like this (I didn't type it into VS so not sure if I got the syntax right):
MemoryMappedFile bigFile = MemoryMappedFile.CreateFromFile(
new FileStream(#"C:\bigfile.dat", FileMode.Create),
"BigFileMemMapped",
1024 * 1024,
MemoryMappedFileAccess.ReadWrite);
MemoryMappedViewAccessor view = MemoryMapped.CreateViewAccessor();
int offset = 1000000000;
view.Write<ObjectType>(offset, ref MyObject);
I noted both paxdiablo's answer on dealing with other applications, and Matteo Italia's comment on Installable File Systems. That made me realize there's another non-trivial solution.
Using reparse points, you can create a "virtual" file from a base file plus deltas. Any application unaware of this method will see a continuous range of bytes, as the deltas are applied on the fly by a file system filter. For small deltas (total <16 KB), the delta information can be stored in the reparse point itself; larger deltas can be placed in an alternative data stream. Non-trivial of course.
I know that this question is marked "Windows", but I'll still add my $0.05 and say that on Linux it is possible to both insert or remove a lump of data to/from the middle of a file without either leaving a hole or copying the second half forward/backward:
fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, offset, len)
fallocate(fd, FALLOC_FL_INSERT_RANGE, offset, len)
Again, I know that this probably won't help the OP but I personally landed here searching for a Linix-specific answer. (There is no "Windows" word in the question, so web search engine saw no problem with sending me here.