Change a file's content to match specific CRC32

Change a file's content to match specific CRC32 - crc

I need help, or direction in how to change a file's content(while still including something in it) in a way that can make its CRC specific. It has some restrictions: I need to include some lines of code, and the PNG declaration at the beginning of the file, yes, it's for LFI.
It's not for malicious hacking purposes.

My spoof code takes a set of bit locations in a message, the current CRC, and the desired CRC, and it will tell you which of those locations to invert to get the desired CRC. It simply solves a set of linear equations to do this.

Related

Reduce a Caffe network model

I'd like to use Caffe to extract image features. However, it takes too long to process an image, so I'm looking for ways to optimize for speed.
One thing I noticed is that the network definition I'm using has four extra layers on top the one from which I'm reading a result (and there are no feedback signals, so they should be safe to delete).
I tried to delete them from the definition file but it had no effect at all. I guess I might need to remove the corresponding part of the file that contains pre-trained weights, too. That is, however, a binary file (a protobuffer) so editing it is not that easy.
Do you think that removing the four layers might have a profound effect of the net performance?
If so then how do I get familiar with the file contents so that I could edit it and how do I know which parts to remove?

first, I don't think removing the binary weights will have any effect.
Second, you can do it easily using the python interface: see this tutorial.
Last but not least, have you tried running caffe time to measure the performance of your net? this may help you identify the bottlenecks of your computations.
PS,
You might find this thread relevant as well.

Caffemodel stores data as key-value pair. Caffe only copies weight for those layers (in train.prototxt) having exactly same name as caffemodel. Hence I don't think removing binary weights will work. If you want to change network structure, just modify train.prototxt and deploy.txt.
If you insist to remove weights from binary file, follow this caffe example.
And to make sure you delete right part, this visualizing tool should help.

I would retrain on a smaller input size, change strides, etc. However if you want to reduce file size, I'd suggest quantizing the weights https://github.com/yuanyuanli85/CaffeModelCompression and then using something like lzma compression (xz for unix). We do this so we can deploy to mobile devices. 8 bit weights compress nicely.

Associate text from source code line to line - too fragile

I need to associate textual data with the lines in a source code file. Something like "these lines are to create a Myclass object" -> lines from 20 to 32.
The problem is that this kind of line tracking is highly fragile: it is sufficient that someone adds a newline to break my correspondence between associated text and lines.
I need an idea to make this link a bit stronger (not too much but at least resisting to a few line shifts), suggestions are greatly welcome.

An easy solution would be to hash (md5 is pretty easy and accessible) the lines and store the hash along the data.
You can then check the hash against the possibly modified file. If it matches, great, otherwise begin checking previous/next lines for a match.

One approach might be to enlist the help of a source control system. For example, using Git, you could associate textual data with a specific version of the source code. If the source code is changed, you can use a "diff" algorithm to discover which line(s) have been added or removed. Using that delta information, you can then update your annotation lines (for example, adding a line at the top of the file would cause your 20-32 annotation to move to 21-33).

Are you trying to implement some form of automatic documentation system? If so, then basing this around line numbering is indeed fragile. I would suggest using some sort of markup to associate the text with semantic blocks of code that are robust when moved or altered. Perhaps something along the lines of doxygen might be what you are looking for.

How to calculate the MasterChecksum and DataForkChecksum on a dmg file

I try to change some information on a dmg files,
It works well but when i tried to open the dmg i have the error message
checksum invalid
So i get the header of my dmg file, and i get all the information that i need .
I have a DataForkChecksum and a MasterChecksum but I don't know how to calculate them .
does anyone knows how to do this ?

The master checksums is a checksum of checksums. It is computed as the CRC-32 checksum (polynomial=0xedb88320) computed on the concatenated binary values of all the checksums of the BLKX blocks present in the DMG. It's quite painful to compute.
libdmg contains GPL code that does this. In particular have a look at the calculateMasterChecksum() function in http://shanemcc.co.uk/libdmg/libdmg-hfsplus/dmg/dmglib.c

If you don’t know a lot about Checksums I recommend taking a quick look at the Wikipedia entry. Essentially they are used to check the integrity of a file to make sure that it has not been changed or interfered with in anyway. I believe this is especially important in the open source community, as the code uploaded to sharing websites could have been interfered with by someone other than it original provider.
(Also take a quick look at MD5 on Wikipedia, one read of this and you will quickly appreciate the difficulty of the problem and proposed solutions. )
By including a Checksum the provider is not guaranteeing the quality of the code (they may well do separately) but they are giving you the ability to ensure that what you are downloading is exactly what they provided. A change to a single byte will change the Checksum.
In your case by modifying the DMG you are changing the Checksum. Without knowing the specifics it’s hard to advise you how to get around it. If your setup is communicating with the original DMG provider in some way to compare the checksums then it will be very difficult to fix. You also have no way of knowing what their checksum is.
If it is comparing it with a locally stored file then you have a chance. The simplest way will be to get one of the free tools for creating Checksums and replace them both.
However all this brings up a question. Why are you modifying an externally provided DMG? If you want your computer to preform additional actions when you click on it I believe there are much simpler ways

Keep .exe timestamp from changing

Does anybody know of a way to prevent the timestamp of an executable from changing? I'm trying to generate a consistent hash code for the .exe but I think the timestamp may be preventing that from happening. Each time I recompile the code (VS C++) the FastSum generates a different checksum.
Thanks!

The PE file format (as in your EXE) has a timestamp field. Check out "Table 2. IMAGE_FILE_HEADER Fields" at this link: http://msdn.microsoft.com/en-us/library/ms809762.aspx
It seems like if you really wanted to, you could edit TimeDateStamp in a hex editor, or write a small program to do it for you. If I read the above document correctly, it looks like it's 4 bytes at offset 10.
I'm not sure what the consequences are of changing this. My guess is it may make you unable to find symbols when you debug the program. Maybe instead of changing this field in your binary you should hash regions outside the PE header. (The link I provide may help you determine where that would make sense.)

Depending on what you have to checksum, you can either strip off the COFF header (where the timestamp resides) or the Optional Header. In the latter case, you just only save the section table and section data (the binary content of the executable). If you make sure your source code is not changed and compile and link flags are not changed, the section data should remain the same. If you want to include version numbers or size of code in the checksum, you must include the Optional Header.
To find the start of Optional Header, follow the procedure:
Read 4-byte signature base address from 0x3c.
Goto the signature offset.
Offset 20 bytes. This is the start of the Optional Header.
You should expect 0x10b here if it is 32-bit exe file or 0x20b if 64-bit.
To find the start of section table, follow the procedure:
Read 4-byte signature base address from 0x3c.
Goto the signature offset.
offset 16 bytes.
Read 2-byte Optional Header size here.
Goto the Optional Header.
Offset Optional Header size bytes. This is the start of the section table.
You should expect a section name here (like ".text", ".data", etc).
For complete specification of PE & COFF format, download this: Microsoft PE and COFF Specification.

Which timestamp? Last accessed? You can't prevent that changing if you are accessing it - however you could take note of it and then change it back?
For a hash - what do you mean? A method of ensuring that the .exe hasn't changed? I'd use a CRC.

File timestamps are something controlled and maintained by the OS - they're not internal to the file (including executables) itself.

Writing to the middle of the file (without overwriting data)

In windows is it possible through an API to write to the middle of a file without overwriting any data and without having to rewrite everything after that?
If it's possible then I believe it will obviously fragment the file; how many times can I do it before it becomes a serious problem?
If it's not possible what approach/workaround is usually taken? Re-writing everything after the insertion point becomes prohibitive really quickly with big (ie, gigabytes) files.
Note: I can't avoid having to write to the middle. Think of the application as a text editor for huge files where the user types stuff and then saves. I also can't split the files in several smaller ones.

I'm unaware of any way to do this if the interim result you need is a flat file that can be used by other applications other than the editor. If you want a flat file to be produced, you will have to update it from the change point to the end of file, since it's really just a sequential file.
But the italics are there for good reason. If you can control the file format, you have some options. Some versions of MS Word had a quick-save feature where they didn't rewrite the entire document, rather they appended a delta record to the end of the file. Then, when re-reading the file, it applied all the deltas in order so that what you ended up with was the right file. This obviously won't work if the saved file has to be usable immediately to another application that doesn't understand the file format.
What I'm proposing there is to not store the file as text. Use an intermediate form that you can efficiently edit and save, then have a step which converts that to a usable text file infrequently (e.g., on editor exit). That way, the user can save as much as they want but the time-expensive operation won't have as much of an impact.
Beyond that, there are some other possibilities.
Memory-mapping (rather than loading) the file may provide efficiences which would speed things up. You'd probably still have to rewrite to the end of the file but it would be happening at a lower level in the OS.
If the primary reason you want fast save is to start letting the user keep working (rather than having the file available to another application), you could farm the save operation out to a separate thread and return control to the user immediately. Then you would need synchronisation between the two threads to prevent the user modifying data yet to be saved to disk.

The realistic answer is no. Your only real choices are to rewrite from the point of the modification, or build a more complex format that uses something like an index to tell how to arrange records into their intended order.
From a purely theoretical viewpoint, you could sort of do it under just the right circumstances. Using FAT (for example, but most other file systems have at least some degree of similarity) you could go in and directly manipulate the FAT. The FAT is basically a linked list of clusters that make up a file. You could modify that linked list to add a new cluster in the middle of a file, and then write your new data to that cluster you added.
Please note that I said purely theoretical. Doing this kind of manipulation under a complete unprotected system like MS-DOS would have been difficult but bordering on reasonable. With most newer systems, doing the modification at all would generally be pretty difficult. Most modern file systems are also (considerably) more complex than FAT, which would add further difficulty to the implementation. In theory it's still possible -- in fact, it's now thoroughly insane to even contemplate, where it was once almost reasonable.

I'm not sure about the format of your file but you could make it 'record' based.
Write your data in chunks and give each chunk an id.
Id could be data offset in file.
At the start of the file you could
have a header with a list of ids so
that you can read records in
order.
At the end of 'list of ids' you could point to another location in the file (and id/offset) that stores another list of ids
Something similar to filesystem.
To add new data you append them at the end and update index (add id to the list).
You have to figure out how to handle delete record and update.
If records are of the same size then to delete you can just mark it empty and next time reuse it with appropriate updates to index table.

Probably the most efficient way to do this (if you really want to do it) is to call ReadFileScatter() to read the chunks before and after the insertion point, insert the new data in the middle of the FILE_SEGMENT_ELEMENT[3] list, and call WriteFileGather(). Yes, this involves moving bytes on disk. But you leave the hard parts to the OS.

If using .NET 4 try a memory-mapped file if you have an editor-like application - might jsut be the ticket. Something like this (I didn't type it into VS so not sure if I got the syntax right):
MemoryMappedFile bigFile = MemoryMappedFile.CreateFromFile(
new FileStream(#"C:\bigfile.dat", FileMode.Create),
"BigFileMemMapped",
1024 * 1024,
MemoryMappedFileAccess.ReadWrite);
MemoryMappedViewAccessor view = MemoryMapped.CreateViewAccessor();
int offset = 1000000000;
view.Write<ObjectType>(offset, ref MyObject);

I noted both paxdiablo's answer on dealing with other applications, and Matteo Italia's comment on Installable File Systems. That made me realize there's another non-trivial solution.
Using reparse points, you can create a "virtual" file from a base file plus deltas. Any application unaware of this method will see a continuous range of bytes, as the deltas are applied on the fly by a file system filter. For small deltas (total <16 KB), the delta information can be stored in the reparse point itself; larger deltas can be placed in an alternative data stream. Non-trivial of course.

I know that this question is marked "Windows", but I'll still add my $0.05 and say that on Linux it is possible to both insert or remove a lump of data to/from the middle of a file without either leaving a hole or copying the second half forward/backward:
fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, offset, len)
fallocate(fd, FALLOC_FL_INSERT_RANGE, offset, len)
Again, I know that this probably won't help the OP but I personally landed here searching for a Linix-specific answer. (There is no "Windows" word in the question, so web search engine saw no problem with sending me here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js