Normally, after compiler optimizations we get textual LLVM IR, so that we can compare the IR's before and after optimizations and reason about. In LTO, we usually input IR bitcode files to the linker (lld) and get native object files. Is there any way to get single monolithic LLVM IR (textual representation) after LTO passes? And Is there any effective way to analyze the object code to find out the optimizations other than just seeing text section of object file.
Thanks
Please tell me, if you need more information!
LTO optimizations are more or less the same that gets applied to the code during normal compilation. The difference is that the module being optimized comes from linking all the modules of a program.
So, you can just compile all your source to LLVM IR (with -flto, for instance), then link object files (which actually are bitcode files) using llvm-link and then play with optimizations running opt on this resulting bitcode. The list of passes applied during LTO stage can be seen in lib/Transforms/IPO/PassManagerBuilder.cpp, populateLTOPassManager(). There is also a nice opt option called -print-after to emit textual IR representation after applied given pass.
Related
I'm writing a compiler that embeds the LLVM API. By copying code from the llc tool, I can output assembly language or object files that I can turn into binaries using clang or an assembler.
But I want my compiler to be self contained. Is it possible to turn LLIR into binaries using LLVM? This seems like the sort of thing that should be in the LLVM toolkit.
Yes, it is possible and this is also done by llc with -filetype=obj argument.
You can consult the compileModule function to learn how to use the programmatic API.
Note that this will only generate an object file for a given translation unit. You will also need a linker to convert it into a proper executable or library. The LLVM linker, lld, can also be embedded into client applications as a library, so in the end you will be able to create a self-hosting compiler.
I'm looking for examples of code that triggers non-determinism in GCC or Clang's compilation process.
One prominent example is the usage of the __DATE__ macro.
GCC and Clang have a plethora of compiler flags to control the outcome of non-deterministic actions within the compiler eg. -frandom-seed and -fno-guess-branch-probability
Are there any small examples that are affected by these flags?
To be more precise:
$ c++ main.cpp -o main && shasum main
aabbccddee
$ c++ main.cpp -o main && shasum main
eeddccbbaa
I'm looking for macro-free code examples where multiple runs of the compiler lead to different outputs, but can be fixed by e.g. -frandom-seed
EDIT:
related: from the gcc docs:
-fno-guess-branch-probability:
Sometimes gcc will opt to use a randomized model to guess branch probabilities,
when none are available from either profiling feedback (-fprofile-arcs)
or __builtin_expect.
This means that different runs of the compiler on the same program
may produce different object code.
The default is -fguess-branch-probability at levels -O, -O2, -O3, -Os.
While old, this question is interesting for reproducible builds.
As you've stated, there are multiple source of non-determinism while compiling some C/C++ source.
Non-determinism in preprocessor
The preprocessor usually implements some numerous super macro which are changing between runs. There's the obvious __DATE__ and __TIME__ but also the non obvious __cplusplus or __STD_C_VERSION__ or __GNUC_PATCHLEVEL__ which can changes when the OS updates.
There's also the __FILE__ that will contain the path of the building environment (different from machine to machine).
Please notice that for the former macro, GCC observes the environment variable SOURCE_DATE_EPOCH to overwrite the date and time macro. Other compilers might have some other behavior.
Non-determinism in the compiler
The compiler might have different optimization strategies based on non-deterministic approach. You've cited one in GCC, but other might exists.
For MSVC, you might be interested in the /BREPRO compiler flag.
You'll have to RTFM for your compiler to know more.
Non-determinism in the linker
On some architecture, the linked object and/or library will contain a timestamp. MacOS is one of them. So for the same set of .o files, you'll get a different resulting executable.
Also, if you use Link Time Optimization, many compiler will create different versions of the .o files named randomly. Again for GCC, you'll use -frandom-seed=31415 to "fix" this randomness, but YMMV.
Non-determinism in the build-process
Sometimes repositories contain additional operation that are performed outside of the compilation stage. Like generating header files based on some configuration flags (or other steps).
In that case, this per-project's specific operations might not be deterministic either.
For a good overview of the deterministic builds, please refer to this post
I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.
(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.
It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.
I have some troubles wrapping my head around what LLVM actually does...
Am I right to assume that it could be used to parse mathematical expressions at runtime in a C++ program?
Right now at runtime, I'm getting the math expressions and build a C program out of it, compile it on the fly by doing system call to gcc. Then I dynamically load the .so produced by gcc and extract my eval function...
I'd like to replace this workflow by something simpler, maybe even faster...
Can LLVM help me out? Any resources out there to get me started?
You're describing using LLVM as a JIT compiler, which is absolutely possible. If you generate LLVM IR code (in memory) and hand it off to the library, it will generate machine code for you (still in memory). You can then run that code however you like.
If you want to generate LLVM IR from C code, you can also link clang as a library.
Here is a PDF I found at this answer, which has some examples of how to use LLVM as a JIT.
How does one generate executable binaries from the c++ side of LLVM?
I'm currently writing a toy compiler, and I'm not quite sure how to do the final step of creating an executable from the IR.
The only solution I currently see is to write out the bitcode and then call llc using system or the like. Is there a way to do this from the c++ interface instead?
This seems like it would be a common question, but I can't find anything on it.
LLVM does not ship the linker necessary to perform this task. It can only write out as assembler and then invoke the system linker to deal with it. You can see the source code of llvm-ld to see how it's done.