How to embed LLVM IR interpreter into other languages?

How to embed LLVM IR interpreter into other languages? - ocaml

I'm trying to embed LLVM IR Interpreter into OCaml.
What I want to do is the following process:
(1) Read LLVM IR files
(2) Draw CFG(control flow graph) from IR
(3) execute(or interpret) each IR codes in basic blocks, with accepting the previous state(or memory) and generating the next state
(1) and (2) are done, but there are no references about step (3).
I used llvm library of OCaml and opt with -dot-cfg for drawing CFGs. But I'm having a difficult time to grasp how to interpret(or execute in-memory) IR codes with OCaml, since I'm new to LLVM.
If there are any other methods to execute LLVM IR in memory with different kinds of language, it will be helpful too.

Related

Is LLVM IR a graph?

I'm starting new research in the field of compiler optimization.
As a start, I'm looking into several different papers related and encountered a few different optimization techniques.
One main thing I'm currently looking at is the compilers' technique that converting input source code into a graph (e.g. control-flow, data-flow, linked list, etc.), then performs optimization onto the graph and produces the machine code. Code-to-Graph-to-Code. For example, JIT compilers in the JavaScript engines, i.e. V8, ChakraCore, etc.
Then, I came across LLVM IR. Because of the earlier searches, my impression of optimization on code is doing on a graph like explained above. However, I do not believe that is the case for LLVM, but I'm not sure. I found that there are tools to generate a control-flow graph from the LLVM IR, but it doesn't mean it's optimizing the graph.
So, my question is "Is LLVM IR a graph?" If not, how does it optimize the code? Code-to-Code directly?

LLVM IR (and it's backend form, Machine IR) is a traditional three-address code IR so technically is not a graph IR in the sense e.g. sea-of-nodes IR is. But it contains several graph structures in it: a graph of basic blocks (Control Flow Graph) and a graph of data dependencies (SSA def-use chains) which are used to simplify optimizations. In addition, during instruction selection phase in backend original LLVM IR is temporarily converted to a true graph IR - SelectionDAG.

Generate binary code (shared library) from embedded LLVM in C++

I am working on a high performance system written in C++. The process needs to be able to understand some complex logic (rules) at runtime written in a simple language developed for this application. We have two options:
Interpret the logic - run a embedded interpreter and generate a dynamic function call, which when receives data, based on the interpreted logic works on the data
Compile the logic into a plugin.so dynamic shared file, use dlopen, dlsym to load the plugin and call logic function at runtime
Option 2 looks to be really attractive as it will be optimized machine code, would run much faster than embedded interpreter in the process.
The options I am exploring are:
write a compile method string compile( string logic, list & errors, list & warnings )
here input logic is a string containing logic coded in our custom language
it generates llvm ir, return value of the compile method returns ir string
write link method bool link(string ir, string filename, list & errors, list & warnings)
for the link method i searched llvm documentation but I have not been able to find out if there is a possibility to write such a method
If i am correct, LLVM IR is converted to LLVM Byte Code or Assembly code. Then either LLVM JIT is used to run in JIT mode or use GNU Assembler is used to generate native code.
Is it possible to find a function in LLVM which does that ? It would be much nicer if it is all done from within the code rather than using a system command from C++ to invoke "as" to generate the plugin.so file for my requirement.
Please let me know if you know of any ways i can generate a shared library native binary code from my process at runtime.

llc which is a llvm tool that does LLVM-IR to binary code translation. I think that is all you need.
Basically you can produce your LLVM IR the way you want and then call llc over your IR.
You can call it from the command line or you can go to the implementation of llc and find out how it works to do that in your own programs.
Here is a usefull link:
http://llvm.org/docs/CommandGuide/llc.html
I hope it helps.

Construct in-memory IR CFG of C/C++ program using LLVM

I am interested in analyzing a CFG of a C/C++ program where the CFG's nodes contain LLVM IR instructions. Is there any way to leverage LLVM to extract a persistent in-memory object of this CFG? I do not want to implement a pass in the compiler; I want the CFG to undergo analysis in my own program.

The LLVM IR in-memory representation is amenable to CFG analysis because all the basic blocks are organized as a graph already. Within a basic block, the instruction sequence is linear. Some interesting in-function CFG-related code in LLVM is: lib/Analysis/CFG.cpp and lib/Analysis/CFGPrinter.cpp

Pin Like Tool for compile time injection of instrumentation code

As you might know, PIN is a dynamic binary instrumentation tool. By using Pin for example, I can instrument every load and store in my application. I was wondering If there is a similar tool which injects code at compile time (Using a higher level of information, not requiring us to write the LLVM pass), rather than at runtime like Pin. I am especially interested for such kind of tool for LLVM.

You could write LLVM passes of your own and apply them on your code to "instrument" it during compile time. These work on LLVM IR and produce LLVM IR, so for some tasks this will be a very natural thing to do and for other tasks it might be cumbersome or difficult (because of the differences between LLVM and IR and the source language). It depends.

C++ Native to Intermediate

Is it theoretically and/or practically possible to compile native c++ to some sort of intermediate language which will then be compiled at run time?
Along the same lines, is "portable" the term used to denote this?

LLVM which is a compiler infrastructure parses C++ code, transforming it to an intermediate language called LLVM IR (IR stands for Intermediate Representation) which looks like high-level assembly language. It is a machine independent language. Generating IR is one phase. In the next phase, it passes through various optimizers (called pass). which then reaches to third phase which emits machine code (i.e machine dependent code).
It is a module-based design; output of one phase (module) becomes input of another. You could save IR on your disk, so that the remaining phases can resume later, maybe on entirely different machine!
So you could generate IR and then do rest of the things on runtime? I've not done that myself, but LLVM seems really promising.
Here is the documentation of LLVM IR:
LLVM Language Reference Manual
This topic on Stackoverlow seems interesting, as it says,
LLVM advantages:
JIT - you can compile and run your code dynamically.
And these articles are good read:
The Design of LLVM (on drdobs.com)
Create a working compiler with the LLVM framework, Part 1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to embed LLVM IR interpreter into other languages? - ocaml

Related

Is LLVM IR a graph?

Generate binary code (shared library) from embedded LLVM in C++

Construct in-memory IR CFG of C/C++ program using LLVM

Pin Like Tool for compile time injection of instrumentation code

C++ Native to Intermediate

Categories

Resources