exe checksum different after each recompile - c++

So I'm trying to figure out how to get my exe to have the same hash code/checksum when it's recompiled. I'm using FastSum to generate the checksum. Currently, no code changes are made, I'm just rebuilding the project in VS and the checksum comes out different. The code is written in c++.
I'm not familiar with using hash codes and/or checksums in this manner, but I did some research and read something about needing a consistent GUID. But I have no idea how that would tie into the checksum generation program...
Well, I'll leave it at that, thanks in advance.

Have you examined the differences between the exes? I suspect the compiler/linker is inserting the date or time into the binary and as a result each binary will be different from another. Or it could be worse, sometimes compilers/linkers build static tables in their own system memory then copy that into the binary, say you have 9 bytes of something and for alignment reasons the compiler chooses to use 12 bytes in the binary, I have seen compilers/linkers take whatever 3 bytes are in system memory of that computer and copy that into the file. Ideally you would want the tools to zero out memory they are using for such a thing so you get repeatable results.
Basically do a binary diff between the files you should then find out why they dont match.

From what I recall, the EXE format includes a build timestamp so a hash of the exe, including that timestamp, would change on each recompile.

Is this a managed binary? Managed binaries have a GUID section that changes from build to build and there's not much you can do to stop that.
You can get a better look at the changes in your binary by running "link /dump /all [filename]" or "link /dump /disasm [filename]". The /all option will show you all the hex values as well as their ascii equivalent, while the /disasm option will disassemble the code and show it to you in assembly, which can be easier to read but might ignore some trivial differences which might have caused the hash to change.

Related

How to modify a function in a compiled DLL

I want to know if it is possible to "edit" the code inside an already compiled DLL.
I.E. imagine that there is a function called sum(a,b) inside Math.dll which adds the two numbers a and b
Let's say i've lost the source code of my DLL. So the only thing i have is the binary DLL file.
Is there a way i could open that binary file, locate where my function resides and replace the sum(a,b) routine with, for example, another routine that returns the multiplication of a and b (instead of the sum)?
In Summary, is it posible to edit Binary code files?
maybe using reverse engineering tools like ollydbg?
Yes it is definitely possible (as long as the DLL isn't cryptographically signed), but it is challenging. You can do it with a simple Hex editor, though depending on the size of the DLL you may have to update a lot of sections. Don't try to read the raw binary, but rather run it through a disassembler.
Inside the compiled binary you will see a bunch of esoteric bytes. All of the opcodes that are normally written in assembly as instructions like "call," "jmp," etc. will be translated to the machine architecture dependent byte equivalent. If you use a disassembler, the disassembler will replace these binary values with assembly instructions so that it is much easier to understand what is happening.
Inside the compiled binary you will also see a lot of references to hard coded locations. For example, instead of seeing "call add()" it will be "call 0xFFFFF." The value here is typically a reference to an instruction sitting at a particular offset in the file. Usually this is the first instruction belonging to the function being called. Other times it is stack setup/cleanup code. This varies by compiler.
As long as the instructions you replace are the exact same size as the original instructions, your offsets will still be correct and you won't need to update the rest of the file. However if you change the size of the instructions you replace, you'll need to manually update all references to locations (this is really tedious btw).
Hint: If the instructions you're adding are smaller than what you replaced, you can pad the rest with NOPs to keep the locations from getting off.
Hope that helps, and happy hacking :-)
Detours, a library for instrumenting arbitrary Win32 functions on x86 machines. Detours intercepts Win32 functions by re-writing target function images. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary.
Download
You can, of course, hex-edit the DLL to your heart's content and do all sorts of fancy things. But the question is why go to all that trouble if your intention is to replace the function to begin with?
Create a new DLL with the new function, and change the code that calls the function in the old DLL to call the function in the new DLL.
Or did you lose the source code to the application as well? ;)

How to measure Code Size?

When certain features or optimizations are discussed, Code Size is often mentioned.
While I certainly understand the basic concept, that is, that a collection of code, compiled to machine code will result in X bytes of machine code (plus static data) I have recently realized that I'm very unsure how to actually measure Code Size of a given binary.
So, how do you measure Code Size?
Do you just check how big the resulting binary ("executable", .exe) is?
Do you need a tool such as dumpbin.exe or some specific linker flags to get detailed results?
You can tell the linker to produce a map file. This gives about the most detailed information that's easy to get (i.e., much short of reverse engineering the code by hand).
Depending on the code, using dumpbin on an object file can produce meaningful results, but can also produce simply "anonymous object" -- especially (exclusively?) when you ask for link-time code generation.
I'd say your best bet is to disassemble the binary.
In the context of code optimizations, total code size isn't typically what is meant, but rather code size for some specific part of your program.
If you mean .exe in bytes in the literal term I think you're over-thinking the question. Your file explorer should say on the right the size of files (if it doesn't, right click the file and open properties). The files you're looking for should be in debug named after .exe
If it's something else, sorry.

How to figure out which methods increases size of 'exe'

I'm trying to write my first 'demoscene' application in MS Visual Studio Express 2010. Suddenly I realized, that my binary expanded from 16kb to ~100kb in fully-optimized-for-size release version. My target size is 64k. Is there any way to somehow "browse" binary to figure out, which methods consumes a lot of space, and which I should rewrite? I really want to know what my binary consists of.
From what I found in web, VS2010 is not the best compiler for demoscenes, but I still want to understand what's happening inside my .exe file.
I think you should have MSVC generate a map file for you. This is a file that will tell you the addresses of most of the different functions in your executable. The difference between consecutive addresses should tell you how much space the function takes. To generate a map file, add the /MAP linker option. For more info, see:
http://msdn.microsoft.com/en-us/library/k7xkk3e2(v=VS.100).aspx
You can strip off lots of unnecessary stuff from the executable and compress it with utilities such as mew.
I've found this useful for examining executable sizes (although not for demoscene type things): http://aras-p.info/projSizer.html
I will say this: if you are using the standard library at all then stop immediately. It is a huge code bloater. For example, each unique usage std::sort adds around 5KB and there's similar numbers for many of the standard containers (of course, it depends what functions you use, but in general they add lots of code).
Also, I'm not into the demo scene, but I believe people use Crinkler to compress their executables.
Use your version contol system to see what caused the increase. Going forward, Id log the built exe size during the nightly builds. And dont forget you can optimize for minimal size with the compiler settings.

Binary Reproducibility in Visual C++

Is there a way to force the same code to produce the same binary in Visual C++? Turn off the timestamp in the PE or force the timestamp in the PE to be some fixed value, in other words?
It's not only a timestamp - there's an embedded GUID used for PDB matching - as John Robbins explains.
Even beyond that, there's just no way to force the compiler to generate consistent results, as Jim Griesmer explains -
compiler writers are far more interested in generating correctly functioning code and generating it quickly than ensuring that whatever is generated is laid out identically on your hard drive. Due to the numerous and varied methods and implementations for optimizing code, it is always possible that one build ended up with a little more time to do something extra or different than another build did. Thus, the final result could be a different set of bits for what is the same functionality.
Thus, function and section order are not guaranteed to be consistently ordered in the resulting PE. An example is at the link.
I suppose you could write a utility to open the PE, set the checksum to 0, set the timestamp to what you like, recompute the crc, then write it back out. It would be nice if there were an official way to ensure perfect binary reproducibility, though.
For more information:
http://msdn.microsoft.com/en-us/magazine/cc301805.aspx

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.