How could I query binary's source code version - c++

My environment is Linux CentOS 6.2. And I've a source control system like svn/hg/git etc. My source code is C/C++.
I want to check in the build binary to keep which binary is release to customer.
And I assume build binary's checksum will different when source code changed.
So, I could reverse trace which binary is build from which version.
Is it possible, what's the tricks I must follow?
I've seen some executable display the revision when execute with -version option.
But I'm wonder how to prevent write wrong -version string into the executable.
If I keep a md5.txt and check-in it instead of check in binary.
How could I make sure I can build the same md5 executable again?
Sorry, for clearing my question and preventing another unexpected answer, I prefer a answer like:
Keep a md5sum.txt in scm when release a new version to user.
Keep binary separate from your SCM.
To rebuild the same md5sum binary you should make sure
write symbol into binary when make(eg. by -DVERSION="1.x")
show the VERSION string to user
remove all $Id, that let your SCM run slower.
keep same CPU & OS & compiler & library environment
...

Create strings within a .cpp file as thus:
static const char version[] = "#(#) $Id$";
where $Id$ is obtained from SVN
Use the what command (see the manual page). It will obtain these strings from the binary so you can check.

Is this an executable or a shared library? If the latter, you could export a function that would return the version (number, string, your choice). Then dlopen(), dlsym(), and execute the function.
For executable ELF binaries, you might be able to implant some data in the binary that can be queried using the 'nm' utility.

If you'll use Subversion, SvnRev will do most work for you (no md5 in repos, repo hold sources, binary - resource with revision-id)
For Mercurial, you can get idea for version sting from VersioningWithMake wiki, and in order to get string like result of git describe, instead of simple template {node|short} for HGVERSION you can use something as {latesttag}+{latesttagdistance}:{node|short}, showing (example) 1.3+11:8a226f0f99aa

Related

How to store and retrieve data from linux binary file

I'm using arm-none-linux-gnueabi-g++ in order to compile a c++ code that will run on an embedded Linux device.
I'm using the arm-none-linux-gnueabi-g++ under windows and get as output the binary file that will run on the Linux machine.
In order to set the embedded device with a new binary, I need to create an archive file (zip) with the binary file and with some more settings files.
till far it all OK.
I need to automate that so that the archive file will be created automatically at name of the version of the binary file.
Currently, we keep the version as just a simple constant std::string variable in the code. We use that string when printing diagnostic, logging, etc.
How can I read that from the version binary file?
Or may other methods to achieve that goal?
I thought may to store it in some constant place in the binary file and read it from there but really don't know how to do that without making the binary corrupted.
You are creating the file automatically, so I assume you are first compiling it and then making an archive with the resulting binary.
You could store the version in a text file, and #include that file in your code:
const std::string version =
#include "version.txt"
;
In the version.txt:
"version string"
And when making the archive, you can easily parse the version from the text file.
Ville is correct.
You're currently doing it backwards!
Your build system should provide the version to the executable, not the other way around. Once this is fixed, your build system can provide the same version to other elements, such as your ZIP filename.
Ideally the version would be generated from version control autonomously, but you could specify it in the build command if really necessary.
It's possible to pull some string from the binary (think nm, if there's a Windows equivalent), but that's really the reverse way to do it.

Change Linux shared library (.so file) version after it was compiled

I'm compiling Linux libraries (for Android, using NDK's g++, but I bet my question makes sense for any Linux system). When delivering those libraries to partners, I need to mark them with a version number. I must also be able to access the version number programatically (to show it in an "About" dialog or a GetVersion function for instance).
I first compile the libraries with an unversioned flag (version 0.0) and need to change this version to a real one when I'm done testing just before sending it to the partner. I know it would be easier to modify the source and recompile, but we don't want to do that (because we should then test everything again if we recompile the code, we feel like it would be less error prone, see comments to this post and finally because our development environment works this way: we do this process for Windows binaries: we set a 0.0 resources version string (.rc) and we later change it by using verpatch...we'd like to work with the same kind of process when shipping Linux binaries).
What would be the best strategy here?
To summarize, requirements are:
Compile binaries with "unset" version (0.0 or anything else)
Be able to modify this "unset" version to a specific one without having to recompile the binary (ideally, run a 3rd party tool command, as we do with verpatch under Windows)
Be able to have the library code retrieve it's version information at runtime
If your answer is "rename the .so", then please provide a solution for 3.: how to retrieve version name (i.e.: file name) at runtime.
I was thinking of some solutions but have no idea if they could work and how to achieve them.
Have a version variable (one string or 3 int) in the code and have a way to change it in the binary file later? Using a binary sed...?
Have a version variable within a resource and have a way to change it in the binary file later? (as we do for win32/win64)
Use a field of the .so (like SONAME) dedicated to this and have a tool allowing to change it...and make it accessible from C++ code.
Rename the lib + change SONAME (did not find how this can be achieved)...and find a way to retrieve it from C++ code.
...
Note that we use QtCreator to compile the Android .so files, but they may not rely on Qt. So using Qt resources is not an ideal solution.
I am afraid you started to solve your problem from the end. First of all SONAME is provided at link time as a parameter of linker, so in the beginning you need to find a way to get version from source and pass to the linker. One of the possible solutions - use ident utility and supply a version string in your binary, for example:
const char version[] = "$Revision:1.2$"
this string should appear in binary and ident utility will detect it. Or you can parse source file directly with grep or something alike instead. If there is possibility of conflicts put additional marker, that you can use later to detect this string, for example:
const char version[] = "VERSION_1.2_VERSION"
So you detect version number either from source file or from .o file and just pass it to linker. This should work.
As for debug version to have version 0.0 it is easy - just avoid detection when you build debug and just use 0.0 as version unconditionally.
For 3rd party build system I would recommend to use cmake, but this is just my personal preference. Solution can be easily implemented in standard Makefile as well. I am not sure about qmake though.
Discussion with Slava made me realize that any const char* was actually visible in the binary file and could then be easily patched to anything else.
So here is a nice way to fix my own problem:
Create a library with:
a definition of const char version[] = "VERSIONSTRING:00000.00000.00000.00000"; (we need it long enough as we can later safely modify the binary file content but not extend it...)
a GetVersion function that would clean the version variable above (remove VERSIONSTRING: and useless 0). It would return:
0.0 if version is VERSIONSTRING:00000.00000.00000.00000
2.3 if version is VERSIONSTRING:00002.00003.00000.00000
2.3.40 if version is VERSIONSTRING:00002.00003.00040.00000
...
Compile the library, let's name it mylib.so
Load it from a program, ask its version (call GetVersion), it returns 0.0, no surprise
Create a little program (did it in C++, but could be done in Python or any other languauge) that will:
load a whole binary file content in memory (using std::fstream with std::ios_base::binary)
find VERSIONSTRING:00000.00000.00000.00000 in it
confirms it appears once only (to be sure we don't modify something we did not mean to, that's why I prefix the string with VERSIONSTRING, to make it more unic...)
patch it to VERSIONSTRING:00002.00003.00040.00000 if expected binary number is 2.3.40
save the binary file back from patched content
Patch mylib.so using the above tool (requesting version 2.3 for instance)
Run the same program as step 3., it now reports 2.3!
No recompilation nor linking, you patched the binary version!

How to compile and execute a stand-alone SML-NJ executable

I have seen one other answer link but what I don't understand is what is basis.cm and what's it's use?
You are asking two questions.
What is basis.cm and what's it's use?
This is the Basis library. It allows the use of built-in functions.
How to compile and execute a stand-alone SML-NJ executable
Assuming you followed Jesper Reenberg's tutorial on how to execute a heap image, the next thing you need in order to have SML/NJ produce a stand-alone executable is to convert this heap image. One should hypothetically be able to do this using heap2exec, a tool that takes the heap image, e.g. the .x86-linux file generated on my system, and generates an .asm file that can be assembled and linked.
Unfortunately, this tool is not very well-maintained, so you have to
Go to the smlnj.org page and fix the download-link by removing 'www.' (this page and the SourceForge page don't contain the same explanations or assumptions about argument count, and neither page's download link work).
Download and extract this tool, and fix the 'build' script so it points to your ml-build tool
Fix the tool's argument use by changing [inf, outf] to [_, inf, outf]
Run ./build which generates 'heap2asm.x86-linux' on my system
For example, in order to generate an .asm file for the heap2asm program itself, run
sml #SMLload heap2asm.x86-linux heap2asm.x86-linux heap2asm.s
At this point, I have unfortunately been unable to produce an executable that works. E.g. if you run gcc -c heap2asm.s and ld heap2asm.o, you get a warning of a missing _start label. The resulting executable segfaults even if you rename the existing _sml_heap_image label to _start. That is, it seems that a piece of entry code that the runtime environment normally delivers is missing here.
At this point, discard SML/NJ and use MLton for producing stand-alone binaries.

C++ binary identification (manifest)

We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile an executable using these libraries and deploy the binary on the front-end. It would be extremely useful to be able to identify that binary. Ideally what we would like to have is a small script that would retrieve the following information directly from the binary:
$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...
So it should display all the information for the binary itself and for all of its dependencies.
Currently our approach is:
During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary
ident script parses target binary, finds generated stuff there and prints this information
However this approach is not always reliable for different versions of gcc..
I would like to ask SO community - is there better approach to solve this problem?
Thanks for any advice
One of the catches with storing data in source code (your Manifest.h and .cpp) is the size limit for literal data, which is dependent on the compiler.
My suggestion is to use ld. It allows you to store arbitrary binary data in your ELF file (so does objcopy). If you prefer to write your own solution, have a look at libbfd.
Let us say we have a hello.cpp containing the usual C++ "Hello world" example. Now we have the following make file (GNUmakefile):
hello: hello.o hello.om
$(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $#
%.om: %.manifest
ld -b binary -o $# $<
%.manifest:
echo "$#" > $#
What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. Here I'm being explicit about the recipe. So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. The recipe states to convert the binary input into an ELF object. The recipe for creating the .manifest itself simply pipes a string into the file.
Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation.
Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell).
The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (i.e. the static libraries etc). But it may not be worth it to take that route ...
Also look at:
C/C++ with GCC: Statically add resource files to executable/library and
Is there a Linux equivalent of Windows' "resource files"?
If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here.
How to access the symbols? Easy. Declare them as external (C linkage!) data and then use them:
#include <cstdio>
extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;
int main(int argc, char** argv)
{
const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}
The symbols are the exact characters/bytes. You could also declare them as char[], but it would result in problems down the road. E.g. for the printf call.
The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable.
Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out.
You can insert any data you like into a .comment section in your output binary. You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this:
asm (".section .comment.manifest\n\t"
".string \"hello, this is a comment\"\n\t"
".section .text");
int main() {
....
The asm statement should go outside any function, in this instance. This should work as long as your compiler puts normal functions in the .text section. If it doesn't then you should make the obvious substitution.
The linker should gather all the .comment.manifest sections into one blob in the final binary. You can extract them from any .o or executable with this:
objdump -j .comment.manfest -s example.o
Have you thought about using standard packaging system of your distro? In our company we have thousands of packages and hundreds of them are automatically deployed every day.
We are using debian packages that contain all the neccessary information:
Full changelog that includes:
authors;
versions;
short descriptions and timestamps of changes.
Dependency information:
a list of all packages that must be installed for the current one to work correctly.
Installation scripts that set up environment for a package.
I think you may not need to create manifests in your own way as soon as ready solution already exists. You can have a look at debian package HowTo here.

How can I make LLVM display graphs in a different viewer?

LLVM can create graphs in Graphviz's "dot" format, and automatically invoke a viewer to display them. By default it uses dotty to display those graphs. I know that I can change it to use a different viewer, but I was not able to find precise instructions on how to do so.
How can I make it open the graphs with a different viewer?
I'm running on Linux but would be interested in an answer for Windows as well.
I found out I'm supposed to change the CMakeCache.txt file in my build folder. For instance, to use XDot instead of dotty, I edited the LLVM_PATH_XDOT_P property in that file to point to the full path of my xdot.py file.
It now opens the alternative viewer successfully, after rebuilding the project.
I just needed to do this.
I managed to do this with a workaround: made a backup of dotty (just in case) and created a link from dotty to XDot.
cp /usr/bin/dotty /usr/bin/dotty_copy
ln -s /usr/bin/dotty /usr/bin/xdot
I believe you could also set some variable during configuration step (possibly LLVM_PATH_DOTTY), but I never tried this as I didn't want to recompile LLVM.
You may try hacking the DisplayGraph function or fidging with the makefiles until you manage to enable one of the #ifdefs in DisplayGraph.