How to produce deterministic binary output with g++? - c++

I work in a very regulated environment where we need to be able to produce identical binary input give the same source code every time be build out products. We currently use an ancient version of g++ that has been patched to not write anything like a date/time in the resulting binaries that would change from build to build, but I would like to update to g++ 4.7.2. Does anyone know of a patch, or have suggestions of what I need to look for to take two identical pieces of source code and produce identical binary outputs?

The Debian Reproducible builds project attempts to standardize Debian packages byte-by-byte, and has received a Linux Foundation grant in 2016.
While this may include more than compilation, you should have a look at it.
It also pointed me to this article, which adds the following points to what #Employed said:
put the source in a fixed folder (e.g. /tmp/build) to deal with __FILE__
for __DATE__, __TIME__, __TIMESTAMP__:
libfaketime : https://github.com/wolfcw/libfaketime
override those macros with -D
-Wdate-time or -Werror=date-time: warn or fail if either __TIME__, __DATE__ or __TIMESTAMP__ are is used. The Linux kernel 4.4 uses it by default.
use the D flag with ar, or use https://github.com/nh2/ar-timestamp-wiper/tree/master to wipe stamps
-fno-guess-branch-probability: older manual versions say it is a source of non-determinism, but not anymore. Not sure if this is covered by -frandom-seed or not.
Buildroot has a BR2_REPRODUCIBLE option which may give some ideas on the package level, but it is far from complete at this point.
Related threads:
https://superuser.com/questions/639351/does-recompiling-a-program-produce-a-bit-for-bit-identical-binary
https://www.quora.com/What-can-be-the-possible-reasons-for-the-object-code-of-an-unchanged-C-file-to-change-on-recompilation

We also depend on bit-identical rebuilds, and are using gcc-4.7.x.
Besides setting PWD=/proc/self/cwd and using -frandom-seed=<input-file-name>, there are a handful of patches, which can be found in svn://gcc.gnu.org/svn/gcc/branches/google/gcc-4_7 branch.

Use of the 'DATE' macro makes the build non-deterministic

Related

CMake how to verify that a loop was auto-vectorized

All C++ compilers that support vectorization allow some report (*) being emitted to verify if a loop was vectorized, each with their own compilation flag and format in the vectorization report.
I can enable the corresponding flag and visually inspect the report to check if a loop that I expect to be auto-vectorized was indeed auto-vectorized.
I would like to incorporate in my CMake build a step that checks this automatically and fails the build if it didn't auto-vectorize.
How can I do this using CMake?
Is there anyone that has solved this problem already, somehow?
Thanks in advance
(*)
MSVC https://learn.microsoft.com/en-us/cpp/build/reference/qvec-report-auto-vectorizer-reporting-level?view=msvc-160
gcc https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html
clang https://llvm.org/docs/Vectorizers.html#diagnostics
I would like to incorporate in my CMake build a step that checks this automatically and fails the build if it didn't auto-vectorize.
How can I do this in CMake?
Applying dynamic programming in steps:
Create an algorithm that can detect if the program was "auto-vectorized".
You can use some output from the compiler generated with special options, why not.
You could also dissasemble the code and find "the loop" and check for some instructions or syntax.
Then write a portable program that does implement that algorithm, preferably in some portable language.
Add a custom target to cmake configuration to run the check that could look like the following:
add_executable(final_exe sources.c...)
add_exececutable(check_if_vectorized sources.c...) # if compiled, choose your own language
add_custom_target(check_if_final_exe_is_vectorized
COMMENT "Check if final_exe was vectorized"
COMMAND $<TARGET_FILE:check_if_vectorized> $<TARGET_FILE:final_exe>
DEPENDS $<TARGET_FILE:check_if_vectorized> $<TARGET_FILE:final_exe>
)
You could also add the check with add_test and have it run like a test.
Is there anything that works out of the box?
No.

Change Linux shared library (.so file) version after it was compiled

I'm compiling Linux libraries (for Android, using NDK's g++, but I bet my question makes sense for any Linux system). When delivering those libraries to partners, I need to mark them with a version number. I must also be able to access the version number programatically (to show it in an "About" dialog or a GetVersion function for instance).
I first compile the libraries with an unversioned flag (version 0.0) and need to change this version to a real one when I'm done testing just before sending it to the partner. I know it would be easier to modify the source and recompile, but we don't want to do that (because we should then test everything again if we recompile the code, we feel like it would be less error prone, see comments to this post and finally because our development environment works this way: we do this process for Windows binaries: we set a 0.0 resources version string (.rc) and we later change it by using verpatch...we'd like to work with the same kind of process when shipping Linux binaries).
What would be the best strategy here?
To summarize, requirements are:
Compile binaries with "unset" version (0.0 or anything else)
Be able to modify this "unset" version to a specific one without having to recompile the binary (ideally, run a 3rd party tool command, as we do with verpatch under Windows)
Be able to have the library code retrieve it's version information at runtime
If your answer is "rename the .so", then please provide a solution for 3.: how to retrieve version name (i.e.: file name) at runtime.
I was thinking of some solutions but have no idea if they could work and how to achieve them.
Have a version variable (one string or 3 int) in the code and have a way to change it in the binary file later? Using a binary sed...?
Have a version variable within a resource and have a way to change it in the binary file later? (as we do for win32/win64)
Use a field of the .so (like SONAME) dedicated to this and have a tool allowing to change it...and make it accessible from C++ code.
Rename the lib + change SONAME (did not find how this can be achieved)...and find a way to retrieve it from C++ code.
...
Note that we use QtCreator to compile the Android .so files, but they may not rely on Qt. So using Qt resources is not an ideal solution.
I am afraid you started to solve your problem from the end. First of all SONAME is provided at link time as a parameter of linker, so in the beginning you need to find a way to get version from source and pass to the linker. One of the possible solutions - use ident utility and supply a version string in your binary, for example:
const char version[] = "$Revision:1.2$"
this string should appear in binary and ident utility will detect it. Or you can parse source file directly with grep or something alike instead. If there is possibility of conflicts put additional marker, that you can use later to detect this string, for example:
const char version[] = "VERSION_1.2_VERSION"
So you detect version number either from source file or from .o file and just pass it to linker. This should work.
As for debug version to have version 0.0 it is easy - just avoid detection when you build debug and just use 0.0 as version unconditionally.
For 3rd party build system I would recommend to use cmake, but this is just my personal preference. Solution can be easily implemented in standard Makefile as well. I am not sure about qmake though.
Discussion with Slava made me realize that any const char* was actually visible in the binary file and could then be easily patched to anything else.
So here is a nice way to fix my own problem:
Create a library with:
a definition of const char version[] = "VERSIONSTRING:00000.00000.00000.00000"; (we need it long enough as we can later safely modify the binary file content but not extend it...)
a GetVersion function that would clean the version variable above (remove VERSIONSTRING: and useless 0). It would return:
0.0 if version is VERSIONSTRING:00000.00000.00000.00000
2.3 if version is VERSIONSTRING:00002.00003.00000.00000
2.3.40 if version is VERSIONSTRING:00002.00003.00040.00000
...
Compile the library, let's name it mylib.so
Load it from a program, ask its version (call GetVersion), it returns 0.0, no surprise
Create a little program (did it in C++, but could be done in Python or any other languauge) that will:
load a whole binary file content in memory (using std::fstream with std::ios_base::binary)
find VERSIONSTRING:00000.00000.00000.00000 in it
confirms it appears once only (to be sure we don't modify something we did not mean to, that's why I prefix the string with VERSIONSTRING, to make it more unic...)
patch it to VERSIONSTRING:00002.00003.00040.00000 if expected binary number is 2.3.40
save the binary file back from patched content
Patch mylib.so using the above tool (requesting version 2.3 for instance)
Run the same program as step 3., it now reports 2.3!
No recompilation nor linking, you patched the binary version!

How to determine which compiler was requested

My project uses SCons to manage the build process. I want to support multiple compilers, so I decided to use AddOption so the user can specify which compiler to use on the command line (with the default being whatever their current compiler is).
AddOption('--compiler', dest = 'compiler', type = 'string', action = 'store', default = DefaultEnvironment()['CXX'], help = 'Name of the compiler to use.')
I want to be able to have built-in compiler settings for various compilers (including things such as maximum warning levels for that particular compiler). This is what my first attempt at a solution currently looks like:
if is_compiler('g++'):
from build_scripts.gcc.std import cxx_std
from build_scripts.gcc.warnings import warnings, warnings_debug, warnings_optimized
from build_scripts.gcc.optimizations import optimizations, preprocessor_optimizations, linker_optimizations
elif is_compiler('clang++'):
from build_scripts.clang.std import cxx_std
from build_scripts.clang.warnings import warnings, warnings_debug, warnings_optimized
from build_scripts.clang.optimizations import optimizations, preprocessor_optimizations, linker_optimizations
However, I'm not sure what to make the is_compiler() function look like. My first thought was to directly compare the compiler name (such as 'clang++') against what the user passes in. However, this immediately failed when I tried to use scons --compiler=~/data/llvm-3.1-obj/Release+Asserts/bin/clang++.
So I thought I'd get a little smarter and use this function
cxx = GetOption('compiler')
def is_compiler (compiler):
return cxx[-len(compiler):] == compiler
This only looks at the end of the compiler string, so that it ignores directories. Unfortunately, 'clang++' ends in 'g++', so my compiler was seen to be g++ instead of clang++.
My next thought was to do a backward search and look for the first occurrence of a path separator ('\' or '/'), but then I realized that this won't work for people who have multiple compiler versions. Someone compiling with 'g++-4.7' will not register as being g++.
So, is there some simple way to determine which compiler was requested?
Currently, only g++ and clang++ are supported (and only their most recently released versions) due to their c++11 support, so a solution that only works for those two would be good enough for now. However, my ultimate goal is to support at least g++, clang++, icc, and msvc++ (once they support the required c++11 features), so more general solutions are preferred.
Compiler just are part of build process. Also you need linker tool and may be other additional programs. In Scons it's named - Tool. List of tools supported from box you can see in man page, search by statement: SCons supports the following tool specifications out of the box: ...
Tool set necessary scons environment variables, it's documented here.
Scons automatically detects compiler in OS and have some priority to choose one of them, of course autodetect will work properly if PATH variable set to necessary dirs. For example of you have msvc and mingw on windows, scons choose msvc tool. For force using tool use Tool('name')(env). For example:
env = Environment()
Tool('mingw')(env)
Now env force using mingw.
So, clang is one of tool which currently not supported from box by scons. You need to implement it, or set env vars such CC, CXX which using scons for generate build commands.
You could just simply use the Python os.path.basename() or os.path.split() functions, as specified here.
You could do what people suggested in the comments by splitting this question into 2 different issues, but I think it could be a good idea to be able to specify the path with the compiler, since you could have 2 versions of g++ installed, and if the user only specifies g++, they may not get the expected version.
There seems to be some confusion about what question is asked here.
For what I can see, this asks how to determine which compiler was chosen by default, so I'll answer that one.
From what I found out, the official way to check the compiler is to look at the construction variable TOOLS, which contains a list of all tools / programs that SCons decided / was told to use in the given construction environment.
env = Environment()
is_gcc = 'g++' in env['TOOLS']
is_clang = 'clangxx' in env['TOOLS']
TOOLS lists only the currently used tools even if SCons can find more of them.
E.g. if you have both GCC and Clang installed and SCons is able to find both, default TOOLS will still contain only GCC.
You can find the full list of predefined tools here.

How could I query binary's source code version

My environment is Linux CentOS 6.2. And I've a source control system like svn/hg/git etc. My source code is C/C++.
I want to check in the build binary to keep which binary is release to customer.
And I assume build binary's checksum will different when source code changed.
So, I could reverse trace which binary is build from which version.
Is it possible, what's the tricks I must follow?
I've seen some executable display the revision when execute with -version option.
But I'm wonder how to prevent write wrong -version string into the executable.
If I keep a md5.txt and check-in it instead of check in binary.
How could I make sure I can build the same md5 executable again?
Sorry, for clearing my question and preventing another unexpected answer, I prefer a answer like:
Keep a md5sum.txt in scm when release a new version to user.
Keep binary separate from your SCM.
To rebuild the same md5sum binary you should make sure
write symbol into binary when make(eg. by -DVERSION="1.x")
show the VERSION string to user
remove all $Id, that let your SCM run slower.
keep same CPU & OS & compiler & library environment
...
Create strings within a .cpp file as thus:
static const char version[] = "#(#) $Id$";
where $Id$ is obtained from SVN
Use the what command (see the manual page). It will obtain these strings from the binary so you can check.
Is this an executable or a shared library? If the latter, you could export a function that would return the version (number, string, your choice). Then dlopen(), dlsym(), and execute the function.
For executable ELF binaries, you might be able to implant some data in the binary that can be queried using the 'nm' utility.
If you'll use Subversion, SvnRev will do most work for you (no md5 in repos, repo hold sources, binary - resource with revision-id)
For Mercurial, you can get idea for version sting from VersioningWithMake wiki, and in order to get string like result of git describe, instead of simple template {node|short} for HGVERSION you can use something as {latesttag}+{latesttagdistance}:{node|short}, showing (example) 1.3+11:8a226f0f99aa

How to avoid symbols and source paths in iOS binary?

When I compile the release version of my iOS app (based on standard Apple supplied iOS app template), look into the resulting executable binary, I see all sorts of symbols and even local cpp source and header paths in there. I'm really stumped why this is (I haven't enabled RTTI*). Especially the source file paths make me feel uncomfortable sending this app across the globe (why should everyone be able to see the directory layout of my development machine?).
Here's are two (randomly picked, moderated) excerpts:
TS/../ACTORS/CActorCanvasCharPart.cpplastMeshcapVerticesOFF BOUNDSupload VERTICES: %d
20CActorCanvasCharPartgrassscrub/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_grass.cppbaseShadowmowerstartmowerloopmowermowerCharcutGrassChargrassStuffgrassParticles/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_grass.h17CStateGame2_grasssinwriteStroke/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_flowers.hflowerBedsandTrailclickstart3inplace2sandDrag/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame
And here are a lot of symbols for self-defined types and structs:
CAssetMgr="_vptr$CMgrBase"^^?"pMain"^{CMain}"inited"B"curveCount"S"curveSpecs"^{CCurveSpec}"gameSpecs"[23{CGameStateSpec="header"{SpecDiskHeader="type"i"version"S}"gameID"C"backgroundColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"clickPointColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"clickPointIconColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"hintColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}}]"currentFont"^{CCharset}"userCharParts"^^{CCharPart}"words"{CDataSet<CName4,CCharArray>="_vptr$CObjectBase"^^?"pMain"^{CMain}"count"i"data"*"dataSize"l}"sets"{CDataSet<CName16,CCharArray>="_vptr$CObjectBase"^^?"pMain"^{CMain}"count"i"data"*"dataSize"l
Can this be avoided, how?
*UPDATE: I just found out that RTTI is on by default. So I cleaned the target, disabled RTTI (GCC_ENABLE_CPP_RTTI = NO) and recompiled. I still see a lot of symbols and source paths in the binary.
UPDATE 2: I checked a few other apps from the app store, and many of them also have their source file paths show up. Pretty scary, if you ask me:
Joined Up Lite
/Users/lloydy/Documents/Development/iPhone/ABC Joined Up/main.m
/Users/lloydy/Documents/Development/iPhone/ABC Joined Up/Classes/SettingsView.m
Crayon Physics
/Users/smproot/Desktop/unzip/CrayonPhysics/v104/Classes/crayon/src/ceng/gameutils/killspriteslowly/killspriteslowly.cpp
/Users/smproot/Desktop/unzip/CrayonPhysics/v104/Classes/crayon/src/ceng/tasks/task/sdl/mixer/ctaskaudiosdlmixer.cpp
Wall Times
/Users/fred/_WORK/ZDNDRP/WallTimes/main.m
/Users/fred/_WORK/ZDNDRP/WallTimes/Classes/SystemCategories.m
Jumbo Calculator
/Users/Christopher/Documents/Development/JumboCalculator 1.0.3/main.m
/Users/Christopher/Documents/Development/JumboCalculator 1.0.3/Classes/CalculatorFaceViewController.m
The file paths are most likely from assert macros which stringify __FILE__ as part of their failure message. iOS's implementation of assert(3) does this, as do the NSAssert macros.
You can remove asserts in release builds by defining NDEBUG (for the C asserts) and NS_BLOCK_ASSERTIONS (for NSAsserts).
In Xcode set Deployment Prostprocessing to Yes in order to trigger Xcode to call the strip command during build process. Then you don't see any source path via nm -a.
However, I still see the source paths of some m files via the strings command :/
What worked for me was setting Generate Debug Symbols to No for release builds. This is under the Apple LLVM 7.0 - Code Generation in Xcode 7.2.
Have ticked the strip debug symbols in the build settings? You can do this (or not) depending on the configuration (build/release). Also you can look into Objective-C Code Obfuscation (which is long winded). From what I gather, you cannot completely remove objective-c information as all method calls are done dynamically, so the library has to have information about your classes/method names in order to function. A useful tip here.
If you have c++ code then you can use the gcc strip utility, although I'm not sure how it like Objetive-C++, if it doesn't you could compile all you cpp into a lib, strip that and link against it in your iOS project.