Construct in-memory IR CFG of C/C++ program using LLVM - c++

I am interested in analyzing a CFG of a C/C++ program where the CFG's nodes contain LLVM IR instructions. Is there any way to leverage LLVM to extract a persistent in-memory object of this CFG? I do not want to implement a pass in the compiler; I want the CFG to undergo analysis in my own program.

The LLVM IR in-memory representation is amenable to CFG analysis because all the basic blocks are organized as a graph already. Within a basic block, the instruction sequence is linear. Some interesting in-function CFG-related code in LLVM is: lib/Analysis/CFG.cpp and lib/Analysis/CFGPrinter.cpp

Related

Is LLVM IR a graph?

I'm starting new research in the field of compiler optimization.
As a start, I'm looking into several different papers related and encountered a few different optimization techniques.
One main thing I'm currently looking at is the compilers' technique that converting input source code into a graph (e.g. control-flow, data-flow, linked list, etc.), then performs optimization onto the graph and produces the machine code. Code-to-Graph-to-Code. For example, JIT compilers in the JavaScript engines, i.e. V8, ChakraCore, etc.
Then, I came across LLVM IR. Because of the earlier searches, my impression of optimization on code is doing on a graph like explained above. However, I do not believe that is the case for LLVM, but I'm not sure. I found that there are tools to generate a control-flow graph from the LLVM IR, but it doesn't mean it's optimizing the graph.
So, my question is "Is LLVM IR a graph?" If not, how does it optimize the code? Code-to-Code directly?
LLVM IR (and it's backend form, Machine IR) is a traditional three-address code IR so technically is not a graph IR in the sense e.g. sea-of-nodes IR is. But it contains several graph structures in it: a graph of basic blocks (Control Flow Graph) and a graph of data dependencies (SSA def-use chains) which are used to simplify optimizations. In addition, during instruction selection phase in backend original LLVM IR is temporarily converted to a true graph IR - SelectionDAG.

What is transformations in LLVM? How is is it related to passes in LLVM?

I'm just starting to learn about llvm and a bit confused with transformations and Passes.
An LLVM pass is something that goes through either by you or by an LLVM backend generated LLVM IR. From the structure of said IR, we can do two things.
Analysis in which we from the IR provides some sort of information about the program for static analysis. The clang static analyzer is an example of such a tool.
Transformation:
Another option is that we change the IR as we pass through it. We make a transformation. Usually, we do this to make the resulting executable better. We optimize the code. This last part is what is called a transformation, or Transform Passes to quote the LLVM documentation. Simply stated, transformations are operations conducted by some transform pass, and that relates to changing the IR into some other form when executing the pass.
More information about this can be found here LLVM passes.

How to embed LLVM IR interpreter into other languages?

I'm trying to embed LLVM IR Interpreter into OCaml.
What I want to do is the following process:
(1) Read LLVM IR files
(2) Draw CFG(control flow graph) from IR
(3) execute(or interpret) each IR codes in basic blocks, with accepting the previous state(or memory) and generating the next state
(1) and (2) are done, but there are no references about step (3).
I used llvm library of OCaml and opt with -dot-cfg for drawing CFGs. But I'm having a difficult time to grasp how to interpret(or execute in-memory) IR codes with OCaml, since I'm new to LLVM.
If there are any other methods to execute LLVM IR in memory with different kinds of language, it will be helpful too.

LLVM instrumentation without IR builder

I need to do heavy instrumentation using a LLVM pass. I want to avoid the IR Builder because it is somehow complicated and the code looks really messy. Isn't there a more convenient way to create LLVM IR? I think of a way where I can use for example C/C++ to create the instrumented code.
I do not use the IR Builder for my instrumentation. Most of the instrumentation is accomplished with two steps:
The LLVM pass identifies the instructions of interest and inserts function calls to the requisite instrumentation routine.
The instrumentation routines are written in C files and compiled and linked into the final program. Using link-time optimization (LTO), this approach achieves good performance by removing the function calls and directly inserting the machine code for the library instrumentation routines.
Therefore, most of the instrumentation is C code that clang compiles down to necessary IR.
Other pieces of instrumentation have to be crafted dynamically, so the IR is constructed by invoking the appropriate XYZInst::Create calls, where XYZ is the specific instruction.
The first approach is closer to what you desire; however, it required writing a separate script to act as the compiler driver and managing Makefiles, libraries, et cetera.

Pin Like Tool for compile time injection of instrumentation code

As you might know, PIN is a dynamic binary instrumentation tool. By using Pin for example, I can instrument every load and store in my application. I was wondering If there is a similar tool which injects code at compile time (Using a higher level of information, not requiring us to write the LLVM pass), rather than at runtime like Pin. I am especially interested for such kind of tool for LLVM.
You could write LLVM passes of your own and apply them on your code to "instrument" it during compile time. These work on LLVM IR and produce LLVM IR, so for some tasks this will be a very natural thing to do and for other tasks it might be cumbersome or difficult (because of the differences between LLVM and IR and the source language). It depends.