Is there a way to make a pass over two llvm-ir? - llvm

I want to compare two llvm-ir programs function by function. I thought it will be help full if I do it as an LLVM pass where I can have access to CFG of the program. It seems all the passes(Module, Function, ..) were working on single program, How can I do a pass over two programs simultaneously?

I would just run llvm-link (a command-line tool bundled with LLVM) to merge the IR files together first, then use a regular module pass.
I think the function renaming rule in llvm-link is something like renaming f to f.llvm.X where X is the module ID, so your pass could identify pairs by them having the same name prefix before the module ID.

Related

Apply LLVM pass to a specific basic block

Is it possible to apply LLVM transformation pass to a specific basic block, instead of the whole IR?
I know how to apply a pass to the whole IR:
$ opt –S –instcombine test.ll –o out.ll
But there might be several basic blocks inside test.ll and I want to apply –instcombine to just one of them.
Generally, no. Some LLVM passes are written to work on whole modules, others on whole functions. Some are also safe to use for single basic blocks (more by chance than by design), but LLVM's pass interface deals with only the design unit (functions in case of function passes, modules in case of module passes). That is, function passes are given a function by the pass manager, and nothing else.

llvm - Pass arguments to a pass

I need to tell the pass to look out for a specific function in the file. And I want to specify which function to look out for 'on the go' i.e when I run the pass. Any idea how I can do that? It's sort of like passing arguments to a function in theory.
Add a command line option using cl::opt<string> and set it when running your pass.
Alternatively, if you are producing an IR from C or C++ using clang, you can utilize __attribute((__annotate__(("foo")))) to mark functions you are interested in.

How to use the RandomNumberGenerator within llvm?

I was hoping someone can give me an example as to how to use the RandomNumberGenerator class within LLVM. All of the examples I am able to find seem to use outdated methods.
I would like to be able to create a RNG within a pass that can be overridden with the '-rng-seed' parameter.
How can this value be accessed if it was provided as a parameter, and how to create the value if it was not provided as a parameter?
Also, I understand that a single RNG is not meant to be shared between threads for a single module. If I am running multiple passes on a module, can they share the same generated RNG?
The RandomNumberGenerator class has a private constructor (check its doc and the source file under llvm/lib/Support/RandomNumberGenerator.cpp), so the only way (that I know of, at least) to get a hold of an instance is via Module's createRNG method.
So, assuming that you have llvm:Function pass (and using C++11):
bool runOnFunction(llvm::Function &CurFunc) override {
auto rng = CurFunc.getParent()->createRNG(this);
llvm::errs() << (*rng)() << '\n';
return false;
}
Now you can run this on a module like this (assuming you modified the hello world pass from the documentation):
opt -load ./libLLVMHelloPass.so -hello foo.bc -o bar.bc
Rerunning this, it will give you the same pseudo-random number.
The -rng-seed option becomes available to your pass once you include the header (and link against the LLVM support library, i.e. llvm-config --libfiles support). So, changing the above execution line to something like:
opt -load ./libLLVMHelloPass.so -hello -rng-seed 42 foo.bc -o bar.bc
should give a different sequence.
Lastly, AFAIK, LLVM passes via opt are run sequentially in the context of a PassManager (certainly for the legacy one). I believe one should adhere to that advice when building a custom standalone LLVM tool using multi-threading (in other words, not intended to be run by opt). For relevant examples of standalone apps using the LLVM API have a look into the unit tests source subdir (one hint is to look for .cpp files that have a main(), although they are not always set up like that).

How can I make the LLVM IR of a function available to my program?

I'm working on a library which I'd like certain introspection features to be available. Let's say I'm compiling with clang, so I have access to libtooling or whatever.
What I'd like specifically is for someone to be able to view the LLVM IR of an already-compiled function as part of the program. I know that, when compiling, I can use -emit-llvm to get the IR. But that saves it to a file. What I'd like is for the LLVM IR to be embedded in and retrievable from the program itself -- e.g. my_function_object.llvm_ir()
Is such a thing possible? Thanks!
You're basically trying to have reflection to your program. Reflection requires the existence of metadata in your binary. This doesn't exist out of the box in LLVM, as far as I know.
To achieve an effect like this, you could create a global key-value dictionary in your program, exposed via an exported function - something like IRInstruction* retrieve_llvm_ir_stream(char* name).
This dictionary would map some kind of identifier (for example, the exported name) of a given function to an in-memory array that represents the IR stream of that function (each instruction represented as a custom IRInstruction struct, for example). The types and functions of the representation format (like the custom IRInstruction struct) will have to be included in your source.
At the step of the IR generation, this dictionary will be empty. Immediately after the IR generation step, you'll need to add a custom build step: open the IR file and populate the dictionary with the data - for each exported function of your program, inject its name as a key to the dictionary and its IR stream as a value. The IR stream would be generated from the definitions of your functions, as read by your custom build tool (which would leverage the LLVM API to read the generated IR and convert it to your format).
Then, proceed to the assembler and linker as before.

Using C++ classes in LLVM Modules

Based on the Kaleidoscope and Kaleidoscope with MCJIT tutorials, I have code to create a Module and function and call it using MCJIT. The function needs a prototype:
auto ft = llvm::FunctionType::get(llvm::Type::getInt32Ty(Context), argTypes, false);
However, the example only covers Double as parameters and return values (the above uses an int). To do anything advanced, you need to pass things like classes and containers.
How do you use existing C++ classes in the module?
Sure, you can link to any library you want, but you need to declare function prototypes to use them. If the library API has classes, how do you declare them?
What I want is something like this:
auto ft = llvm::FunctionType::get(llvm::Type::getStructTy("class.std::string"), argTypes, false);
where class.std::string has been imported from string.h.
The LLVM API only has primitive types. You can define structs to represent the classes, but this is way too hard to do manually (and not portable).
A way to do it might be to compile the class to bitcode and read it into a module, but I want to avoid temporary files if possible. Also I'm not sure how to extract the type from the module but it should be possible. I tried this on a header file of one of my classes (I renamed the header file to a cpp file otherwise clang would make into a .gch precompiled header) and the result was just a constant... maybe it was optimised out? I tried it on the cpp file and it resulted in 36000 lines of code...
Then I found this page. Instead of using the LLVM API, I should use the Clang API because Clang, as a compiler, can compile the code into a Module. Then I can use the LLVM API with the imported Modules. Is this the right way to go? Any working source code is appreciated because it took forever just to get function calling working (the tutorials are out of date and documentation is scarce).
The way I would do it is to compile the class to LLVM IR, and then link the two modules. Then, there's two options to extract the type from the module:
First, you can use the llvm::TypeFinder. The way you use it is by creating it, and then calling run() on it with the module as an argument. This code snippet will print out all of the types in the module:
llvm::TypeFinder type_finder;
type_finder.run(module, true);
for (auto t : type_finder) {
std::cout << t->getName().str() << std::endl;
}
Alternatively, it's possible to use Module's getIdentifiedStructTypes() method and iterate over the resulting vector in the same way as above.