Repeatable object code generation c++ - c++

When I build a project using a c++ compiler, can I make sure that the produced binary is not affected if there were no changes in source code? It looks like everytime I recompile my source, the binary's md5 checksum is affected. Is the time of compilation somehow affecting the binary produced? How can I produce compilation results which are repeatable?

One can disassemble the binaries and run md5 on the output
Example on MacOSX
otool -tV a.out | md5
ee2e724434a89fce96aa6b48621f7220
But, one misses out on the global data...(might be a parameter to include too)
I'm answering on the problem of md5 checking a binary...how you manage your sources and build system as others have written about is also a thing to look at

I suspect it'll heavily depend on your toolchain and OS. For example, if one of the executable headers contains a timestamp then you're always going to find the resulting MD5 is different.
What's the end result you're trying to achieve (ie why is it so important that they're identical)..?

You can't do a md5 checksum comparison for visual studio. For a normal Release version .exe file from visual studio there will be 3 locations that change with each recompile. 2 of them are timestamps and the third is a unique GUID that visual studio uses to match versions of the .exe with helper files to ensure they are in sync.
It might be possible to write a tool that will zero out the 3 changing fields, but I'm not sure how easy it would be to parse the file.
Also, if you are calling any .dlls, if I recall right, you will get more unique identifiers in the generated file.
The Debug version is a different story. I think there are many, many more differences.

Use an incremental build system - such as make to ensure you don't recompile your code if the source doesn't change.
It may be possible to get your compile to make identical binaries from the same source - or it may not - it depends on the compiler. Most will embed the current time in the generated binary somewhere.

Related

Setting up files to compile on any computer in Visual Studio

Question:
Once my code is working how should I prepare my files so that a stranger on a different computer can compile it without difficulty?
Additional Details:
I am sending a code sample to a company as part of an application so obviously an elegant solution would be better (i.e. minimise number of files required etc) and no work should be necessary by the stranger at the other end.
Although I am only using one simple library, even so I need to set include directories, include lib files, images, dll files etc so that it all compiles correctly.
If it matters, I am using Visual Studio 2015 and the simple library is SDL.
Sorry if this is a duplicate, I was sure that this question would have been asked before but if it exists I just don't know the correct terminology to find it amongst the noise.
Apologies if this is overly simplistic, but you might want to bound the scope of your project by deciding which computers you want to support, and build your code yourself on those platforms, in advance, just to be sure.
List the supported platforms in your release notes, including any platform-specific instructions or information (which VC++ versions, which C++ versions, which OS versions, which DLLs, directory structure, etc.).
You may have to stick some "#ifdef"s and such in your code, but only by building on a particular platform/configuration will you really know for sure.
You can use properties/props files in your VS solution which sets the paths to includes and precompiled libs, then reference the build variables in your project files.
To compile on another machine, you just need to change the values in the properties files.

C Compiler automatic Version increment

I'm wondering if there's a macro or a simple way to let the compiler increment either major, minor or revision of my code each time when I compile?
By the way I'm using the ARM compiler and uVision from Keil.
To set the version is not a compiler topic. This should be done in connection with a source code control / version control system like cvs/svn/git or others. Your build id should be connected to the content of your source code database to get reproducible builds from checkouts from your version control system. Or, if your code is not already committed to your database a dirty-tag should be provided and compiled in to give the user of the software a chance to see that this is not a controlled version.
Simply counting a value in a variable can be done by a Makefile or in pre- and post-build instructions which depends on the used IDE. Sorry, for keil I have no experience...
Define Post-Build Event to run small external program. The program has to modify specific .h file. In the header file define macros like VER_MAJOR, VER_MINOR, VER_BUILD. Date/time string can also be updated. I use this method and can control the version numbers as I wish.
IMO, you do not need to do this, especially increase a number every time you compile your code.
Set the major and minor revision manually in a header file; you should not have to do this often.
Build number should only be related to the source control revision number (i.e. you should be able to build and rebuild any revision under source control).
Imagine if you are a team of 5 developers, and everyone build and rebuild on their side what is the actual build number?
do they all update a header file ?
who is responsible of owning that header file ?
Some compilers do support features like "post build", which runs a program of your choice after compiling, but that would be tricky if your program is built from multiple source files. Not all compilers do though.
For that reason, I wouldn't actually do this sort of thing via the compiler. I'd do it in the build script (e.g. makefile) or by configuring the build settings in your IDE.
Assuming you're using make or similar, add a target something like setversion (you pick the name) that runs a program which modifies a header file which specifies the components of your version number. So typing make setversion will update your version numbers.
Optionally, that target can also - after updating the version numbers - do a make clean (i.e. delete all object files and executables) and make all (to recompile and link everything).
I also suggest avoiding changing version numbers after each recompile. Imagine you're busily testing and debugging code, and go through several rebuild cycles. Do you really want the version number updated every time you recompile even one source file? It can be done that way if you choose, but will make every rebuild take longer (and, in projects with multiple source files) you will need to take care if you want to preserve capability for incremental builds.

Comparing the Checksums of Two executables built from the same exact source

I have a question regarding the verification of executable files, compiled with visual studio, using a checksum:
If I build a project from src, I end up with an executable, call it exec1.exe, that has some metadata in it.
If I later rebuild the same exact src, I get another executable, say exec2.exe, it also has its own metadata section.
If I create a checksum for each of the two files, they differ since the metadata information between the two files is different.
Does anyone know of a way to bypass the metadata when I do a checksum of the files, so that regardless of the metadata, doing a checksum of the two files will result in the same checksum value? Or how to compile the binaries, such that as long as the src is identical, I end up with the same executables?
Thanks in advance for your input,
Regards
There is no guarantee that Visual C++ will generate the same binary image when building the same source files on successive builds. The checksum is not intended to be used in this manner, and after a bit of research it seems that this is difficult to achieve. Rather, resources such as this kb article can help in comparing files.
Checksums are usually used to find errors resulting from sending/storing data, not to compare versions/builds of an executable.
If you have the pdb file as well you can use the DIA sdk to query all the source files that were used to build the executable. Basically enumerate all the IDiaSourceFile and each IDiaSourceFile has a get_checksum method. You can generate a master checksum that is combination of all the checksums of source files that were used to make the executable. If any of the checksum of any source file has changed you can kind of assume that the executable has changed as well.
This is the same mechanism that Visual Studio uses to determine if a source file is in sync with the pdb so that it can be stepped into for debugging purposes.

When should I delete object files before compiling?

In compile scripts I generally see a call to always delete object files before compiling. Does this slow down the build process? Is it really necessary with compilers that check if the object files are out of date when deciding to recompile them?
Sometimes if you revert a source file to a previous version, the .o will have a newer date that its source, and therefore won't be fed to the compiler. If you had a reason to revert the source file, you almost surely want the object rebuilt. Doing a clean build ensures you get what you think you're getting.
In some way, deleting existing object files defeats the purpose of having separate translation units in the first place. Your standard build environment should normally only rebuild those object files which are older than the corresponding source file. (You don't need to delete, you can just overwrite.)
If you have a decent revision control system, then even checking out an older version of a modified file will make the actual file on disk have a current timestamp, but indeed, if you're worried that something might be inconsistent, you can always clean up the entire build tree and start over. But as a matter of normal code writing, it would appear terribly wasteful to delete object files.
You should of course keep one set of object files for each set of build options (e.g. debug vs release). Some build environments allow you to have multiple output directories, others (like cmake) will just automatically rebuild everything if you change the global build settings, but that's something to watch out for, especially if you just add some #defines to the compiler flags in the middle of a build process.
1) Yes, and 2) no, but it's not the compiler that watches out for that, it's the build system (an IDE or well-written makefile).

What is the difference between compile code and executable code?

I always use the terms compile and build interchangeably.
What exactly do these terms stand for?
Compiling is the act of turning source code into object code.
Linking is the act of combining object code with libraries into a raw executable.
Building is the sequence composed of compiling and linking, with possibly other tasks such as installer creation.
Many compilers handle the linking step automatically after compiling source code.
From wikipedia:
In the field of computer software, the term software build refers either to the process of converting source code files into standalone software artifact(s) that can be run on a computer, or the result of doing so. One of the most important steps of a software build is the compilation process where source code files are converted into executable code.
While for simple programs the process consists of a single file being compiled, for complex software the source code may consist of many files and may be combined in different ways to produce many different versions.
A build could be seen as a script, which comprises of many steps - the primary one of which would be to compile the code.
Others could be
running tests
reporting (e.g. coverage)
static analysis
pre and post-build steps
running custom tools over certain files
creating installs
labelling them and deploying/copying them to a repository
They often are used to mean the same thing. However, "build" may also mean the full process of compiling and linking a whole application (in the case of e.g. C and C++), or even more, including, among others
packaging
automatic (unit and/or integration) testing
installer generation
installation/deployment
documentation/site generation
report generation (e.g. test results, coverage).
There are systems like Maven, which generalize this with the concept of lifecycle, which consists of several stages, producing different artifacts, possibly using results and artifacts from previous stages.
From my experience I would say that "compiling" refers to the conversion of one or several human-readable source files to byte code (object files in C) while "building" denominates the whole process of compiling, linking and whatever else needs to be done of an entire package or project.
Most people would probably use the terms interchangeably.
You could see one nuance : compiling is only the step where you pass some source file through the compiler (gcc, javac, whatever).
Building could be heard as the more general process of checking out the source, creating a target folder for the compiled artifacts, checking dependencies, choosing what has to be compiled, running automated tests, creating a tar / zip / ditributions, pushing to an ftp, etc...