I need to write a LLVM backend for a language that doesn't support jumps (conditional or unconditional). The only flow control structures I can use are if-then-else and while loops (plus break and continue).
Is there any way (or utility) to convert from LLVM jump-based+phi-node based to this?
You can refer to the relooper algorithm in the Emscripten, which was targeting LLVM IR to javascript, and there's no "goto" statement.
https://kripken.github.io/emscripten-site/docs/introducing_emscripten/Talks-and-Publications.html
Related
I'm starting new research in the field of compiler optimization.
As a start, I'm looking into several different papers related and encountered a few different optimization techniques.
One main thing I'm currently looking at is the compilers' technique that converting input source code into a graph (e.g. control-flow, data-flow, linked list, etc.), then performs optimization onto the graph and produces the machine code. Code-to-Graph-to-Code. For example, JIT compilers in the JavaScript engines, i.e. V8, ChakraCore, etc.
Then, I came across LLVM IR. Because of the earlier searches, my impression of optimization on code is doing on a graph like explained above. However, I do not believe that is the case for LLVM, but I'm not sure. I found that there are tools to generate a control-flow graph from the LLVM IR, but it doesn't mean it's optimizing the graph.
So, my question is "Is LLVM IR a graph?" If not, how does it optimize the code? Code-to-Code directly?
LLVM IR (and it's backend form, Machine IR) is a traditional three-address code IR so technically is not a graph IR in the sense e.g. sea-of-nodes IR is. But it contains several graph structures in it: a graph of basic blocks (Control Flow Graph) and a graph of data dependencies (SSA def-use chains) which are used to simplify optimizations. In addition, during instruction selection phase in backend original LLVM IR is temporarily converted to a true graph IR - SelectionDAG.
I want to force LLVM to generate CMPx-, TEST- and alike instructions on x86-64 to be up to 8 bit width only, forcing e.g. 32bit-int comparisons into four separate cmp+branch pairs. This obviously requires some bit-masking and increased instruction count.
Can I achieve this by simply "disabling" certain instructions for x86-64 so LLVM auto-generates the required glue code? Do I have to write a pass and work on the IR myself?
No, there is no way of disabling certain instructions like this from a vanilla build of LLVM. Anything you do to achieve this will require modifying LLVM.
You have several options for modifying LLVM:
You can add an x86-specific pass to the LLVM backend (does not work on the IR) which directly expands the cmp and test instructions into chains of instructions on sub-registers. You would have to do this after instruction selection to preclude some target-independent pass from undoing the transformation. This is called an "MI" pass in LLVM parlance. As an example you can look in X86FixupSetCC.cpp (mirror here). This has a huge advantage in that you can put it behind a flag and otherwise control whether it occurs once you add the core functionality.
You can modify LLVM's instruction tables for LLVM in the X86 .td files to only define these instructions for 8-bit registers, and then add the def Pat<...>; patterns to the .td files that allow programs with wider comparisons to still have their instructions selected (much as Colin suggested above). This has the disadvantage of not only have you modified your LLVM but you can't easily turn those modifications on and off behind some flag.
You can't do anything to LLVM's IR that will really help here because the code generator will just optimize things back into instruction patterns you're trying to avoid.
Hope this helps!
What you're probably looking to do is redefine the lowering pattern in the x86 .td files. There's code that looks like "def Pat<...>;" that defines a translation from one graph of instructions to another. There should be a pattern for going from IR comparison instructions to the x86 32bit compare instructions. You'll want to edit this pattern and instead output your sequence of comparisons.
I have following tasks that I need to do as part of my research idea:
1. Parse the C file(s) at hand to get llvm-IR.
2. Do analysis on the IR. Possibly add and remove some instructions or BB
3. Emit either x86 executable or C (need to decide later)
I think this is quite common task for any one writing analysis on C, I want to do all these tasks in C/C++ (as most of our research code is in C/C++). I googled a lot, although lot of documentation is available on Tasks 2 and 3, but less is available on Task 1, any idea on this would be really helpful.
I want to hook these tasks as a pipe line, any suggestion for this are also welcome.
-Thanks
(1) can be done by using Clang to emit LLVM IR.
(2) can be done by writing your own LLVM pass, and later invoking it (with any other passes you're interested in) via LLVM's opt tool.
(3) (to x86) can be done by LLVM's llc tool.
All of these are also accessible as APIs, not only command-line tools, making it possible to incorporate into your pipe.
I have written an LLVM pass that modifies the Intermediate Representation (IR) code. To increase portability, I also want it to work with a gcc compiler. So I was wondering if there is any tool which can convert some Intermediate Representation (IR) of gcc to LLVM IR.
You probably want dragonegg (which is using the GCC front-end to build an LLVM IR).
And if you wanted to work on the GCC internal representations, MELT (a high level domain specific language to extend GCC) is probably the right tool.
It will probably be much easier to simply write another version of your code that works with gcc IR. What you want to do is likely not possible, and if it is possible, it's probably extremely difficult. (More so than writing the LLVM pass in the first place.)
Is it theoretically and/or practically possible to compile native c++ to some sort of intermediate language which will then be compiled at run time?
Along the same lines, is "portable" the term used to denote this?
LLVM which is a compiler infrastructure parses C++ code, transforming it to an intermediate language called LLVM IR (IR stands for Intermediate Representation) which looks like high-level assembly language. It is a machine independent language. Generating IR is one phase. In the next phase, it passes through various optimizers (called pass). which then reaches to third phase which emits machine code (i.e machine dependent code).
It is a module-based design; output of one phase (module) becomes input of another. You could save IR on your disk, so that the remaining phases can resume later, maybe on entirely different machine!
So you could generate IR and then do rest of the things on runtime? I've not done that myself, but LLVM seems really promising.
Here is the documentation of LLVM IR:
LLVM Language Reference Manual
This topic on Stackoverlow seems interesting, as it says,
LLVM advantages:
JIT - you can compile and run your code dynamically.
And these articles are good read:
The Design of LLVM (on drdobs.com)
Create a working compiler with the LLVM framework, Part 1