llvm and install time optimization - c++

Based on LLVM official page, it is possible to have install-time optimization, based on my understanding, which first compiles to bytecode on build machine before distribution, and then on target machines, converts the bytecode to native code when installing.
Is there any real world example on this feature? More specifically, I am wondering if it is possible to take an arbitrary open source C/C++ project which uses autoconf (i.e. typically built and installed by ./configure && make && make install), and
on build machine, by running ./configure && make in a special
way (e.g. setting some environment variables, or even modify the
configure.ac or some other autoconf files) so that it generates
executable and libraries as byte code;
I transfer the build tree to target machine, and run make install
in a special way so that it installs all files as usual,
but converts byte code to native code for executable and libraries.

As #delnan indicated, this isn't possible in general. LLVM is a target independent IR, but it is not portable.
There have been a few attempts to construct a portable IR, PNaCl among them, but these are different from LLVM.

LLVM IR is target independent, meaning that it could be generated on one machine (compile time) and converted to bytecode (link time) on another and it would still generate the same bytecode as it would have on the first machine, provided that you were using the same version of LLVM with the same options. It does not mean that the IR that was generated would produce a valid binary on all machines.
The problem with this lies in the way that the ABI can vary between different systems.
This post addresses those differences in more detail:
LLVM bitcode cross-platform

Related

How to apply llvm passes using CMake

We have implemented a LLVM pass and compiled it to a library (called libMyPass.so).
We want to apply this pass to a project (all its source code files) which uses cmake to build it. Is there a way to do so in cmake?
Generally, we used clang to emit llvm bit code from a source file, opt to apply this pass to the bit code, llc to translate the new bit code to assembly language and clang again to compile assembly language to executable. Can I encapsulate this process using cmake?
You can have a look or even use this repo that implements the various steps as cmake commands.
The gist of it is that creates various commands (using the cmake's add_custom_command) basically do exactly what you're looking for using the various LLVM subtools in conjunction with the various cmake target properties, in order to create the IR generation commands from source and to native binary code (i.e. .o).
For example, using llvmir_attach_bc_target() attaches to a top-level cmake target and creates a (unoptimized) .bc file for each source file in the SOURCES property of it.
It contains various examples in the same repo that should be enough to get you started.

Linux/gcc: Is there a way to check if a build is identical to a previous one on a different machine?

Scenario:
A build of a C++ project (under Linux with gcc) is supposed to be repeated on a new machine.
If the very same gcc version is used for the build:
Is the build output (binaries) identical? Or are time stamps, path, or other information contained in the binary that makes them different?
If the output is not identical: Can the output be compared in a way to see if the binaries are "functional" identical? I.e. can a tool be called that just compares the actual code inside the binary (without meta data like e.g. debug info)?
Motivation:
I have to prepare a clean new Linux system on which a software system (that was developed by someone else, let's call them X) has to be built.
In order to build the software I will get a source tree with build scripts, makefiles etc.
Now I have to check that the build output is the same as the last build that was provided by X (and built on their system). I need to do this to make sure that the source that will be handed over is the same that was used to build the last version that we got in binary form from X.
If the very same gcc version is used for the build: Is the build output (binaries) identical?
What changes is the CPU more than the version of GCC and the static linking it does. I believe the answer you are looking for lies here.
It basically boils dow to "it depends".
Quoting the linked answer by Rici :
the binaries are likely to differ unless the CPUs are similar (and even then, it's possible)
And like Oliv said, objdump is also a good way to compare the differences.

How to generate llvm bitcode for large programs with many source code files and a huge Makefile (e.g. memcached)?

I have my pass that I tested on toy programs and now I want to run it on large programs, many of which are open source programs like memcached. Such programs have their own Makefile and a complicated compilation procedure. I want to generate a bitcode file for such programs to let my pass work on them. Help and suggestions will be appreciated!
Depending on what you're pass is doing you can:
Build with LTO: adding -flto to the CFLAGS and building your application with your own built linker plugin is quite seamless from a build system point of view. However it requires some understand about how to setup LTO.
Build with your own built clang: adding statically your pass to the LLVM pipeline and use your own built clang. Depending on the build system, exporting CC/CXX environment variable pointing to your installed clang should be enough.
Build by loading your pass dynamically into clang, for example this is what Polly is (optionally) doing.
If you add -emit-llvm to your clang flags, it will emit BC files instead of object files or LL files instead of assembly.
You'll likely have to modify the makefile some more bit that should get you started in the right direction.

How to install Clang from the binary distribution?

Clang has a binary distro, of sorts, but there isn't any README file or anything to tell you what's in the tarball or what to do with it.
It appears that I need to separately download and install libc++. I may need to copy only the clang binary and maybe a few others, but not all the llvm-* stuff. This is just judging by the lack of any C++ headers in the binary distro (although some environment-specific headers are included), and the lack of llvm-as and such on my existing LLVM 3.2 installation from Xcode.
I just want to run the compiler, not develop with libclang or assemble LLVM assembly files. Is there an instruction page somewhere?
The LLVM project doesn't actually expect many people to use the binary distribution they put out. LLVM does releases for the periodic validation, but it's expected that most users will get LLVM through their OS distro or will build the version they want from source.
See this email thread where clang developers are discussing how the binaries distrbution is used.
That said, you can use their distribution if you want. What to install depends on what you want to do:
Use clang as a static compiler.
Build clang based tools.
Use LLVM as a backend for your custom language compiler.
I may need to copy only the clang binary and maybe a few others, but not all the llvm-* stuff.
If all you want to do is compile C/C++/Obj-C, then I believe all you need is the clang binary (and the 'clang++' symbolic link), the 'built-in' headers, and the runtime libraries. You'll find those headers and libs in /lib/clang/<version>/. (The clang compiler typically finds its built-in parts by their location relative to the binary.)
If you want to use LLVM as a backend, you'll need either the LLVM headers and libraries to build and link against, or you'll need some of the ll* binaries to process output of your frontend.
If you want to build clang based tools you'll need the clang headers and libraries to build and link against, either the stable C API or the unstable C++ API.
Note that the libraries are built with RTTI and exceptions disabled. This changes the ABI and so you can't link these with code built with RTTI or exceptions enabled.
It appears that I need to separately download and install libc++.
Correct, libc++ is not included as part of LLVM's distribution. Many of the nominal LLVM subprojects aren't included. LLDB is another example.
Nor does LLVM include a standard C library or the basic Objective-C frameworks.
For Ubuntu/Debian (incuding Linux Mint) based Linux distributions, there are also pre-built .deb files from http://llvm.org/apt/
This has the advantage that it is easer to uninstall at a later point, and also provides Clang 3.4 nightly builds (the 3.3 version is also provided). Simply add one line to your sources.list (or use a GUI package manager to do so) and you're set.

Building autotooled software to LLVM bitcode

I would like to compile software using the autotools build system to LLVM bitcode; that is, I would like the executables obtained at the end to be LLVM bitcode, not actual machine code.
(The goal is to be able to run LLVM bitcode analysis tools on the whole program.)
I've tried specifying CC="clang -emit-llvm -use-gold-plugins" and variants to the configure script, to no avail. There is always something going wrong (e.g. the package builds .a static libraries, which are refused by the linker).
It seems to me that the correct way to do it would be that LLVM bitcode should be a cross-compilation target. to be set with --host=, but there is no such standard target (even though there is a target for Knuth's MMIX).
So far I've used kludges, such as compiling with CC="clang -emit-llvm -use-gold-plugins" and running linking lines (using llvm-ld or llvm-link) manually. This works for simple packages such as grep.
I would like a method that's robust and works with most, if not all, configure scripts, including when there are intermediate .a files, or intermediate targets.
There are some methods like this. But for simple builds where intermediate static libraries are not used, then you can do something simpler. The list of things you will need are
llvm, configured with gold plugin support. Refer to this
clang
dragonegg, if you need front-end for fortran, go, etc.
The key is to enable '-flto' for either clang or dragonegg(front-end), both at compile time and link time. It is straightforward for clang:
CC = clang
CLINKER = clang
CFLAGS = -flto -c
CLINKFLAGS = -flto -Wl,-plugin-opt=also-emit-llvm
If needed, add additional '-plugin-opt' option to specify llvm-specific codegen option:
-Wl,-plugin-opt=also-emit-llvm,-plugin-opt=-disable-fp-elim
The dumped whole problem bytecode would be sitting along with your final executable.
Two additional things are needed when using dragonegg.
First, the dragonegg is not aware of the location of llvm gold plugin, it needs to be specified in the linker flags like this -Wl,-plugin=/path/to/LLVMgold.so,-plugin-opt=...
Second, dragonegg is only able to dump IR rather than bytecode. You need a wrapper script for that purpose. I created one here. Works fine for me.