LLVM IR bitcode file versioning - llvm

I have LLVM IR files in textual form that were compiled targeting different LLVM versions. In my LLVM IR interpreter I want to identify which parser I should select to parse such a file, since the textual format is not backward compatible. However, I cannot see a version ID in these files.
If I instead change my parsers to parse the bit code files, I have longer compatibility guarantees. There is also a version field but the documentation mentions that it is currently always 0.
So is there any way to identify the LLVM version of a bit code file, or do I have no other choice than letting the user specify the version?

Related

Is it possible to compile LLIR to binary without clang?

I'm writing a compiler that embeds the LLVM API. By copying code from the llc tool, I can output assembly language or object files that I can turn into binaries using clang or an assembler.
But I want my compiler to be self contained. Is it possible to turn LLIR into binaries using LLVM? This seems like the sort of thing that should be in the LLVM toolkit.
Yes, it is possible and this is also done by llc with -filetype=obj argument.
You can consult the compileModule function to learn how to use the programmatic API.
Note that this will only generate an object file for a given translation unit. You will also need a linker to convert it into a proper executable or library. The LLVM linker, lld, can also be embedded into client applications as a library, so in the end you will be able to create a self-hosting compiler.

Is it possible to find the LLVM version from textual LLVM IR .ll file?

I have several .ll files containing LLVM IR code in textual form. I want to filter the files depending on their LLVM version for example I would like to find all the files that use LLVM version 3-7.
Currently, I have tried to convert .ll file to .bc file using llvm-as tool and tried using llvm-bcanalyzer to get some useful information such as the required version number but it seems that I was mistaken and llvm-bcanalyzer does not provide such information.
So is there any way to find out which version of LLVM was used to write a given .ll file?
Actually, the version of the entire LLVM toolchain will be same as version of the clang used to generate the .ll file, as clang is a part of LLVM toolchain only.
Just open the .ll file and search clang version and you will be able to find something like clang version 12.0.0

How is clang able to steer C/C++ code optimization?

I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.
(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.
It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.

How do I parse LLVM IR

I have LLVM IR code in text format. What I wanna do is to be able to parse it and modify that code. Is there an API which can help in parsing the LLVM IR code? What libraries should I have in my system? At this moment I have clang compiler installed as well LLVM, as I can use commands such as llc, opt and llvm-link.
LLVM is primarily a C++ library. It has all the tools you can imagine to parse, manipulate and produce IR in both textual and bitcode (binary) formats.
To get started, take a look at the llvm::ParseIRFile function, defined in header include/llvm/Support/IRReader.h.
The best way to proceed would be to download the LLVM source code and build it, following these instructions. It's then easy to write your own code that uses the LLVM libraries.

How to embed LLVM?

The LLVM Core project consists of:
Compiler - converts source code to LLVM IR
VM - executes compiled IR code
How can I embed the VM to a C++ application?
The LLVM is really a collection of libraries that you can link to, so it's pretty easy to embed. More often the LLVM takes IR that you generate and compiles it directly to machine code. There is also a library available to interpret and execute IR for platforms that do not support JIT compilation.
There's a pretty good tutorial available on the LLVM website here: http://llvm.org/docs/tutorial/. I suggest that you go through that and then ask more specific questions if you have them.
Take a look at the HowToUseJIT example in LLVM.