How to make a clean clang front-end? - c++

I'm working on a C++ source analyzer project and it seems that clang is nice candidate for
the parsing work. The problem is that clang heavily depends on the infrastructure "llvm" project,
How do I configure it to get a clean front-end without any concrete machine oriented backend?
Just like LCC does, they provide a "null" backend for people who focus on parser parts.
Any suggestion is appreciated.

I recently did this on Windows.
Download the clang and llvm source from here.
Install cmake and Python (contrary to the docs, you do need Python just to build clang; at least, cmake gives up if it can't find a Python runtime).
You also need VS2008 or VS2010.
One thing that's not entirely obvious is the required directory structure:
projectRoot
build <- intermediate build files and DLLs, etc. will go here
llvm <- contents of llvm-3.0.src from llvm-3.0.tar go here
tools
clang <- contents of clang-3.0.src from clang-3.0.tar go here
And follow the windows build instructions from step 4 onwards. Don't attempt to use the cmake GUI, it's a horror; just use the commands given in the build instructions.
Once the build is complete (which takes a while) you'll have:
projectRoot
build
bin
Release <- libclang.dll will be here
lib
Release <- libclang.lib will be here
llvm
tools
clang
include
clang-c <- Index.h is here
Index.h defines the API to access information about your source code; it contains quite a bit of documentation about the APIs.
To get started using clang you need something like:
CXIndex index = clang_createIndex(1, 1);
// Support Microsoft extensions
char *args[] = {"-fms-extensions"};
CXTranslationUnit tu = clang_parseTranslationUnit(index, "mySource.c", args, ARRAY_SIZE(args), 0, 0, 0);
if (tu)
{
CXCursor cursor = clang_getTranslationUnitCursor(tu);
// Use the cursor functions to navigate through the AST
}

Unfortunately, you cannot get "pure" front-end without machine-specific details. C/C++ are inherently machine-tied languages. Think about preprocessor and built-in defines, the sizes of the builtin types, etc. Some of these can be abstracted out, but not e.g. preprocessor.

Related

Use autotools installation prefix

I am writing a C++ program using gtkmm as the window library and autotools as my build system. In my Makefile.am, I install the icon as follows:
icondir = $(datadir)/icons/hicolor/scalable/apps
icon_DATA = $(top_srcdir)/appname.svg
EDIT: changed from prefix to datadir
This results in appname.svg being copied to $(datadir)/icons/hicolor/scalable/apps when the program is installed. In my C++ code, I would like to access the icon at runtime for a window decoration:
string iconPath = DATADIR + "/icons/hicolor/scalable/apps/appname.svg";
// do stuff with the icon
I am unsure how to go about obtaining DATADIR for this purpose. I could use relative paths, but then moving the binary would break the icon, which seems evident of hackery. I figure that there should be a special way to handle icons separate from general data, since people can install 3rd party icon packs. So, I have two questions:
What is the standard way of installing and using icons with autotools/C++/gtkmm?
Edit: gtkmm has an IconTheme class that is the standard way to use icons in gtkmm. It appears that I add_resource_path() (for which I still need the installation prefix), and then I can use the library to obtain the icon by name.
What is the general method with autotools/C++ to access the autotools installation prefix?
To convey data determined by configure to your source files, the primary methods available are to write them in a header that your sources #include or to define them as macros on the compiler command line. These are handled most conveniently via the AC_DEFINE Autoconf macro. Under some circumstances, you might also consider converting source files to templates for configure to process, but except inasmuch as Autoconf itself uses an internal version of that technique to build config.h (when that is requested), I wouldn't normally recommend it.
HOWEVER, the installation prefix and other installation directories are special cases. They are not finally set until you actually run make. Even if you set them via the configure's command-line options, you can still override that by specifying different values on the make command line. Thus, it is not safe to rely on AC_DEFINE for this particular purpose, and in fact, doing so may not work at all (will not work for prefix itself).
Instead, you should specify the appropriate macro definition in a command-line option that is evaluated at make time. You can do this for all targets being built by setting the AM_CPPFLAGS variable in your Makefile.am files, as demonstrated in another answer. That particular example sets the specified symbol to be a macro that expands to a C string literal containing the prefix. Alternatively, you could consider defining the whole icon directory as a symbol. If you need it only for one target out of several then you might prefer setting the appropriate onetarget_CPPFLAGS variable.
As an aside, do note that $(prefix)/icons/hicolor/scalable/apps is a nonstandard choice for the installation directory for your icon. That will typically resolve to something like /usr/local/icons/hicolor/scalable/apps. The conventional choice would be $(datadir)/icons/hicolor/scalable/apps, which will resolve to something like /usr/local/share/icons/hicolor/scalable/apps.
In your Makefile.am, use the following
AM_CPPFLAGS = -DPREFIX='"$(prefix)"'
See Defining Directories in autoconf's manual.

Change Linux shared library (.so file) version after it was compiled

I'm compiling Linux libraries (for Android, using NDK's g++, but I bet my question makes sense for any Linux system). When delivering those libraries to partners, I need to mark them with a version number. I must also be able to access the version number programatically (to show it in an "About" dialog or a GetVersion function for instance).
I first compile the libraries with an unversioned flag (version 0.0) and need to change this version to a real one when I'm done testing just before sending it to the partner. I know it would be easier to modify the source and recompile, but we don't want to do that (because we should then test everything again if we recompile the code, we feel like it would be less error prone, see comments to this post and finally because our development environment works this way: we do this process for Windows binaries: we set a 0.0 resources version string (.rc) and we later change it by using verpatch...we'd like to work with the same kind of process when shipping Linux binaries).
What would be the best strategy here?
To summarize, requirements are:
Compile binaries with "unset" version (0.0 or anything else)
Be able to modify this "unset" version to a specific one without having to recompile the binary (ideally, run a 3rd party tool command, as we do with verpatch under Windows)
Be able to have the library code retrieve it's version information at runtime
If your answer is "rename the .so", then please provide a solution for 3.: how to retrieve version name (i.e.: file name) at runtime.
I was thinking of some solutions but have no idea if they could work and how to achieve them.
Have a version variable (one string or 3 int) in the code and have a way to change it in the binary file later? Using a binary sed...?
Have a version variable within a resource and have a way to change it in the binary file later? (as we do for win32/win64)
Use a field of the .so (like SONAME) dedicated to this and have a tool allowing to change it...and make it accessible from C++ code.
Rename the lib + change SONAME (did not find how this can be achieved)...and find a way to retrieve it from C++ code.
...
Note that we use QtCreator to compile the Android .so files, but they may not rely on Qt. So using Qt resources is not an ideal solution.
I am afraid you started to solve your problem from the end. First of all SONAME is provided at link time as a parameter of linker, so in the beginning you need to find a way to get version from source and pass to the linker. One of the possible solutions - use ident utility and supply a version string in your binary, for example:
const char version[] = "$Revision:1.2$"
this string should appear in binary and ident utility will detect it. Or you can parse source file directly with grep or something alike instead. If there is possibility of conflicts put additional marker, that you can use later to detect this string, for example:
const char version[] = "VERSION_1.2_VERSION"
So you detect version number either from source file or from .o file and just pass it to linker. This should work.
As for debug version to have version 0.0 it is easy - just avoid detection when you build debug and just use 0.0 as version unconditionally.
For 3rd party build system I would recommend to use cmake, but this is just my personal preference. Solution can be easily implemented in standard Makefile as well. I am not sure about qmake though.
Discussion with Slava made me realize that any const char* was actually visible in the binary file and could then be easily patched to anything else.
So here is a nice way to fix my own problem:
Create a library with:
a definition of const char version[] = "VERSIONSTRING:00000.00000.00000.00000"; (we need it long enough as we can later safely modify the binary file content but not extend it...)
a GetVersion function that would clean the version variable above (remove VERSIONSTRING: and useless 0). It would return:
0.0 if version is VERSIONSTRING:00000.00000.00000.00000
2.3 if version is VERSIONSTRING:00002.00003.00000.00000
2.3.40 if version is VERSIONSTRING:00002.00003.00040.00000
...
Compile the library, let's name it mylib.so
Load it from a program, ask its version (call GetVersion), it returns 0.0, no surprise
Create a little program (did it in C++, but could be done in Python or any other languauge) that will:
load a whole binary file content in memory (using std::fstream with std::ios_base::binary)
find VERSIONSTRING:00000.00000.00000.00000 in it
confirms it appears once only (to be sure we don't modify something we did not mean to, that's why I prefix the string with VERSIONSTRING, to make it more unic...)
patch it to VERSIONSTRING:00002.00003.00040.00000 if expected binary number is 2.3.40
save the binary file back from patched content
Patch mylib.so using the above tool (requesting version 2.3 for instance)
Run the same program as step 3., it now reports 2.3!
No recompilation nor linking, you patched the binary version!

How to produce deterministic binary output with g++?

I work in a very regulated environment where we need to be able to produce identical binary input give the same source code every time be build out products. We currently use an ancient version of g++ that has been patched to not write anything like a date/time in the resulting binaries that would change from build to build, but I would like to update to g++ 4.7.2. Does anyone know of a patch, or have suggestions of what I need to look for to take two identical pieces of source code and produce identical binary outputs?
The Debian Reproducible builds project attempts to standardize Debian packages byte-by-byte, and has received a Linux Foundation grant in 2016.
While this may include more than compilation, you should have a look at it.
It also pointed me to this article, which adds the following points to what #Employed said:
put the source in a fixed folder (e.g. /tmp/build) to deal with __FILE__
for __DATE__, __TIME__, __TIMESTAMP__:
libfaketime : https://github.com/wolfcw/libfaketime
override those macros with -D
-Wdate-time or -Werror=date-time: warn or fail if either __TIME__, __DATE__ or __TIMESTAMP__ are is used. The Linux kernel 4.4 uses it by default.
use the D flag with ar, or use https://github.com/nh2/ar-timestamp-wiper/tree/master to wipe stamps
-fno-guess-branch-probability: older manual versions say it is a source of non-determinism, but not anymore. Not sure if this is covered by -frandom-seed or not.
Buildroot has a BR2_REPRODUCIBLE option which may give some ideas on the package level, but it is far from complete at this point.
Related threads:
https://superuser.com/questions/639351/does-recompiling-a-program-produce-a-bit-for-bit-identical-binary
https://www.quora.com/What-can-be-the-possible-reasons-for-the-object-code-of-an-unchanged-C-file-to-change-on-recompilation
We also depend on bit-identical rebuilds, and are using gcc-4.7.x.
Besides setting PWD=/proc/self/cwd and using -frandom-seed=<input-file-name>, there are a handful of patches, which can be found in svn://gcc.gnu.org/svn/gcc/branches/google/gcc-4_7 branch.
Use of the 'DATE' macro makes the build non-deterministic

How to determine which compiler was requested

My project uses SCons to manage the build process. I want to support multiple compilers, so I decided to use AddOption so the user can specify which compiler to use on the command line (with the default being whatever their current compiler is).
AddOption('--compiler', dest = 'compiler', type = 'string', action = 'store', default = DefaultEnvironment()['CXX'], help = 'Name of the compiler to use.')
I want to be able to have built-in compiler settings for various compilers (including things such as maximum warning levels for that particular compiler). This is what my first attempt at a solution currently looks like:
if is_compiler('g++'):
from build_scripts.gcc.std import cxx_std
from build_scripts.gcc.warnings import warnings, warnings_debug, warnings_optimized
from build_scripts.gcc.optimizations import optimizations, preprocessor_optimizations, linker_optimizations
elif is_compiler('clang++'):
from build_scripts.clang.std import cxx_std
from build_scripts.clang.warnings import warnings, warnings_debug, warnings_optimized
from build_scripts.clang.optimizations import optimizations, preprocessor_optimizations, linker_optimizations
However, I'm not sure what to make the is_compiler() function look like. My first thought was to directly compare the compiler name (such as 'clang++') against what the user passes in. However, this immediately failed when I tried to use scons --compiler=~/data/llvm-3.1-obj/Release+Asserts/bin/clang++.
So I thought I'd get a little smarter and use this function
cxx = GetOption('compiler')
def is_compiler (compiler):
return cxx[-len(compiler):] == compiler
This only looks at the end of the compiler string, so that it ignores directories. Unfortunately, 'clang++' ends in 'g++', so my compiler was seen to be g++ instead of clang++.
My next thought was to do a backward search and look for the first occurrence of a path separator ('\' or '/'), but then I realized that this won't work for people who have multiple compiler versions. Someone compiling with 'g++-4.7' will not register as being g++.
So, is there some simple way to determine which compiler was requested?
Currently, only g++ and clang++ are supported (and only their most recently released versions) due to their c++11 support, so a solution that only works for those two would be good enough for now. However, my ultimate goal is to support at least g++, clang++, icc, and msvc++ (once they support the required c++11 features), so more general solutions are preferred.
Compiler just are part of build process. Also you need linker tool and may be other additional programs. In Scons it's named - Tool. List of tools supported from box you can see in man page, search by statement: SCons supports the following tool specifications out of the box: ...
Tool set necessary scons environment variables, it's documented here.
Scons automatically detects compiler in OS and have some priority to choose one of them, of course autodetect will work properly if PATH variable set to necessary dirs. For example of you have msvc and mingw on windows, scons choose msvc tool. For force using tool use Tool('name')(env). For example:
env = Environment()
Tool('mingw')(env)
Now env force using mingw.
So, clang is one of tool which currently not supported from box by scons. You need to implement it, or set env vars such CC, CXX which using scons for generate build commands.
You could just simply use the Python os.path.basename() or os.path.split() functions, as specified here.
You could do what people suggested in the comments by splitting this question into 2 different issues, but I think it could be a good idea to be able to specify the path with the compiler, since you could have 2 versions of g++ installed, and if the user only specifies g++, they may not get the expected version.
There seems to be some confusion about what question is asked here.
For what I can see, this asks how to determine which compiler was chosen by default, so I'll answer that one.
From what I found out, the official way to check the compiler is to look at the construction variable TOOLS, which contains a list of all tools / programs that SCons decided / was told to use in the given construction environment.
env = Environment()
is_gcc = 'g++' in env['TOOLS']
is_clang = 'clangxx' in env['TOOLS']
TOOLS lists only the currently used tools even if SCons can find more of them.
E.g. if you have both GCC and Clang installed and SCons is able to find both, default TOOLS will still contain only GCC.
You can find the full list of predefined tools here.

How to avoid symbols and source paths in iOS binary?

When I compile the release version of my iOS app (based on standard Apple supplied iOS app template), look into the resulting executable binary, I see all sorts of symbols and even local cpp source and header paths in there. I'm really stumped why this is (I haven't enabled RTTI*). Especially the source file paths make me feel uncomfortable sending this app across the globe (why should everyone be able to see the directory layout of my development machine?).
Here's are two (randomly picked, moderated) excerpts:
TS/../ACTORS/CActorCanvasCharPart.cpplastMeshcapVerticesOFF BOUNDSupload VERTICES: %d
20CActorCanvasCharPartgrassscrub/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_grass.cppbaseShadowmowerstartmowerloopmowermowerCharcutGrassChargrassStuffgrassParticles/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_grass.h17CStateGame2_grasssinwriteStroke/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame2_flowers.hflowerBedsandTrailclickstart3inplace2sandDrag/Volumes/Data/iOS_projects/code/MyAppName_proj/MyAppName/source/STATES/GAMES/2/CStateGame
And here are a lot of symbols for self-defined types and structs:
CAssetMgr="_vptr$CMgrBase"^^?"pMain"^{CMain}"inited"B"curveCount"S"curveSpecs"^{CCurveSpec}"gameSpecs"[23{CGameStateSpec="header"{SpecDiskHeader="type"i"version"S}"gameID"C"backgroundColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"clickPointColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"clickPointIconColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}"hintColor"{CRGBAcolorf="r"f"g"f"b"f"a"f}}]"currentFont"^{CCharset}"userCharParts"^^{CCharPart}"words"{CDataSet<CName4,CCharArray>="_vptr$CObjectBase"^^?"pMain"^{CMain}"count"i"data"*"dataSize"l}"sets"{CDataSet<CName16,CCharArray>="_vptr$CObjectBase"^^?"pMain"^{CMain}"count"i"data"*"dataSize"l
Can this be avoided, how?
*UPDATE: I just found out that RTTI is on by default. So I cleaned the target, disabled RTTI (GCC_ENABLE_CPP_RTTI = NO) and recompiled. I still see a lot of symbols and source paths in the binary.
UPDATE 2: I checked a few other apps from the app store, and many of them also have their source file paths show up. Pretty scary, if you ask me:
Joined Up Lite
/Users/lloydy/Documents/Development/iPhone/ABC Joined Up/main.m
/Users/lloydy/Documents/Development/iPhone/ABC Joined Up/Classes/SettingsView.m
Crayon Physics
/Users/smproot/Desktop/unzip/CrayonPhysics/v104/Classes/crayon/src/ceng/gameutils/killspriteslowly/killspriteslowly.cpp
/Users/smproot/Desktop/unzip/CrayonPhysics/v104/Classes/crayon/src/ceng/tasks/task/sdl/mixer/ctaskaudiosdlmixer.cpp
Wall Times
/Users/fred/_WORK/ZDNDRP/WallTimes/main.m
/Users/fred/_WORK/ZDNDRP/WallTimes/Classes/SystemCategories.m
Jumbo Calculator
/Users/Christopher/Documents/Development/JumboCalculator 1.0.3/main.m
/Users/Christopher/Documents/Development/JumboCalculator 1.0.3/Classes/CalculatorFaceViewController.m
The file paths are most likely from assert macros which stringify __FILE__ as part of their failure message. iOS's implementation of assert(3) does this, as do the NSAssert macros.
You can remove asserts in release builds by defining NDEBUG (for the C asserts) and NS_BLOCK_ASSERTIONS (for NSAsserts).
In Xcode set Deployment Prostprocessing to Yes in order to trigger Xcode to call the strip command during build process. Then you don't see any source path via nm -a.
However, I still see the source paths of some m files via the strings command :/
What worked for me was setting Generate Debug Symbols to No for release builds. This is under the Apple LLVM 7.0 - Code Generation in Xcode 7.2.
Have ticked the strip debug symbols in the build settings? You can do this (or not) depending on the configuration (build/release). Also you can look into Objective-C Code Obfuscation (which is long winded). From what I gather, you cannot completely remove objective-c information as all method calls are done dynamically, so the library has to have information about your classes/method names in order to function. A useful tip here.
If you have c++ code then you can use the gcc strip utility, although I'm not sure how it like Objetive-C++, if it doesn't you could compile all you cpp into a lib, strip that and link against it in your iOS project.