Apply LLVM pass to a specific basic block - llvm

Is it possible to apply LLVM transformation pass to a specific basic block, instead of the whole IR?
I know how to apply a pass to the whole IR:
$ opt –S –instcombine test.ll –o out.ll
But there might be several basic blocks inside test.ll and I want to apply –instcombine to just one of them.

Generally, no. Some LLVM passes are written to work on whole modules, others on whole functions. Some are also safe to use for single basic blocks (more by chance than by design), but LLVM's pass interface deals with only the design unit (functions in case of function passes, modules in case of module passes). That is, function passes are given a function by the pass manager, and nothing else.

Related

How to use the RandomNumberGenerator within llvm?

I was hoping someone can give me an example as to how to use the RandomNumberGenerator class within LLVM. All of the examples I am able to find seem to use outdated methods.
I would like to be able to create a RNG within a pass that can be overridden with the '-rng-seed' parameter.
How can this value be accessed if it was provided as a parameter, and how to create the value if it was not provided as a parameter?
Also, I understand that a single RNG is not meant to be shared between threads for a single module. If I am running multiple passes on a module, can they share the same generated RNG?
The RandomNumberGenerator class has a private constructor (check its doc and the source file under llvm/lib/Support/RandomNumberGenerator.cpp), so the only way (that I know of, at least) to get a hold of an instance is via Module's createRNG method.
So, assuming that you have llvm:Function pass (and using C++11):
bool runOnFunction(llvm::Function &CurFunc) override {
auto rng = CurFunc.getParent()->createRNG(this);
llvm::errs() << (*rng)() << '\n';
return false;
}
Now you can run this on a module like this (assuming you modified the hello world pass from the documentation):
opt -load ./libLLVMHelloPass.so -hello foo.bc -o bar.bc
Rerunning this, it will give you the same pseudo-random number.
The -rng-seed option becomes available to your pass once you include the header (and link against the LLVM support library, i.e. llvm-config --libfiles support). So, changing the above execution line to something like:
opt -load ./libLLVMHelloPass.so -hello -rng-seed 42 foo.bc -o bar.bc
should give a different sequence.
Lastly, AFAIK, LLVM passes via opt are run sequentially in the context of a PassManager (certainly for the legacy one). I believe one should adhere to that advice when building a custom standalone LLVM tool using multi-threading (in other words, not intended to be run by opt). For relevant examples of standalone apps using the LLVM API have a look into the unit tests source subdir (one hint is to look for .cpp files that have a main(), although they are not always set up like that).

Available Analysis and Transform passes for LLVM

Is there any document on the list of Analysis and Transform passes available for use in the AnalysisUsage::addRequired<> and Pass::geAnalysis<> functions?
I can get a list of passes in http://llvm.org/docs/Passes.html, but it only shows the command line names for the passes. How can I know the underlying pass classes?
Not really, no. Just look at the source. The header files in include/llvm/Analysis/ and include/llvm/Transforms/ will tell you everything you need to know.
Moreover, grepping over the source for getAnalysis< will tell you which passes are used as analyses inside the LLVM source code.

LLVM traverse CFG

I want to apply a DFS traversing algorithm on a CFG of a function. Therefore, I need the internal representation of the CFG. I need oriented edges and spotted MachineBasicBlock::const_succ_iterator. It is there a way to get the CFG with oriented edges by using a FunctionPass, instead of a MachineFunctionPass? The reason why I want this is that I have problems using MachineFunctionPass. I have written several complex passes till now, but I cannot run a MachineFunctionPass pass.
I found that : "A MachineFunctionPass is a part of the LLVM code generator that executes on the machine-dependent representation of each LLVM function in the program. Code generator passes are registered and initialized specially by TargetMachine::addPassesToEmitFile and similar routines, so they cannot generally be run from the opt or bugpoint commands."...So how I can run a MachineFunctionPass?
When I was trying to run with opt a simple MachineFunctionPass, I got the error :
Pass 'mycfg' is not initialized.
Verify if there is a pass dependency cycle.
Required Passes:
opt: PassManager.cpp:638: void llvm::PMTopLevelManager::schedulePass(llvm::Pass*): Assertion `PI && "Expected required passes to be initialized"' failed.
So I have to initialize the pass. But in my all other passes I did not any initialization and I don't want to use INITIALIZE_PASS since I have to recompile the llvm file that is keeping the pass registration... Is there a way to keep using static RegisterPass for a MachineFunctionPass ? I mention that if I change to FunctionPass, I have no problems, so indeed it might be an opt problem.
I have started another pass for CallGraph. I am using CallGraph &CG = getAnalysis<CallGraph>(); efficiently. It is a similar way of getting CFG-s? What I found till now are succ_iterator/succ_begin/succ_end which are from CFG.h, but I think I still have to get the CFG analysis somehow.
Thank you in advance !
I think you may have some terms mixed up. Basic blocks within each function are already arranged in a kind-of CFG, and LLVM provides you the tools to traverse that. See my answer to this question, for example.
MachineFunction lives on a different level, and unless you're doing something very special, this is not the level you should operate on. It's too low-level, and too target specific. There's some overview of the levels here

Specify dependency of my LLVM pass on the mem2reg pass

I am writing a ModulePass and invoke it using opt -load. I would require that alloca has been promoted to registers when my pass runs, using the -mem2reg switch for opt.
There is a link which indicates that the PromoteMemoryToRegsiter pass is a transform pass and as such should not be required by my pass. That's a statement from 2010. Does that still hold?
One of the posts I found suggested something like
AU.addRequiredID(PromoteMemoryToRegister::MemoryToRegisterID);
but that contradicted the post I linked above.
So my question is, how to I express this dependency for my pass, if possible? How do I express, in general, such pass dependencies? And what's the difference between a transform pass and, well, another pass?
What's the difference between a transform pass and another pass?
A transform pass is a pass that may invalidate the results of other passes.
How to I express this dependency for my pass?
First of all, I recommend reading the pass-dependency section of the official "how to write a pass" guide. In any case, the correct way to add a dependency between transformation passes is to add one before the other in your pass manager (see the guide section on the pass manager), or, if you just invoke opt, then add all the passes you want in the order you want them to occur, e.g.:
opt -load mypass.so -mem2reg -mypass

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.