What is LLVM metadata - llvm

These might be very basic questions..
1) What is LLVM metadata and how do I use it in my program? I've read all the documentation, but I don't understand how to use it.
2) How to I add my personal metadata in a file?
Thanks in advance!

The best source of information would be the blog post from 2010 that introduced metadata into LLVM IR - Extensible Metadata in LLVM IR. The first paragraph mentions the motivation:
This metadata could be used to influence language-specific
optimization passes (for example, Type Based Alias Analysis in C), tag
information for a custom code generator, or pass through information
to link time optimization.
But reall, read all of if for the historical details.
The main "client" of metadata in LLVM is currently debug info. It's used by the front-end (e.g. Clang) to tag the LLVM IR it generates with debug information that correlates IR to the source code it came from. This same metadata is later translated to platform specific debug info such as DWARF by the code emitters.

Here is a simple example:
llvm::LLVMContext &Ctx = LLMod.getContext();
llvm::IntegerType *Int32Ty = llvm::Type::getInt32Ty(Ctx);
llvm::Metadata *MapleVerElts[] = {
llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(Int32Ty, 0)),
llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(Int32Ty, 1)),
};
llvm::NamedMDNode *MapleVerMD
= LLMod.getOrInsertNamedMetadata("maple-as.version");
MapleVerMD->addOperand(llvm::MDNode::get(Ctx, MapleVerElts));
const MPLModule::FunctionListType &FuncList = Mod.getFunctionList();
LLMod.dump();
And you may get the following output:
!maple-as.version = !{!0}
!0 = !{i32 0, i32 1}
Maybe it will give you a simple hint to use MetaData ^^

Related

LLVM 3.4 tag modified instructions

I'm would like to be able to detect which part of a program has been modified by a previous LLVM pass.
How can I tag instruction / basic blocks and function such as I can retrieve that a pass P1 has previously modified this part of the code ?
I would like to achieve something like:
// First pass
...
tag<bool>(instruction, "modified");
// Second pass
if(has_tag<bool>(instruction, "modified"))
do_something...
Is there a feature in LLVM allowing to make such tag system ?
You may wish to look at the LLVM diff engine in its the toolset:
https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-diff/llvm-diff.cpp

In Roslyn API what the Speculative Semantic Model is?

What does the extension function SemanticModel.TryGetSpeculativeSemanticModel return? What is it good for?
I could not find any meaningful documentation on the subject.
The documentation for TryGetSpeculativeSemanticModel says:
Get a SemanticModel object that is associated with X that did not appear in this source code. This can be used to get detailed semantic information about sub-parts of X that did not appear in source code.
The analyzer code for the StyleCop SX1101 diagnostic offers a great example of this API's usage. SX1101 tells you that it's safe to remove a this. qualifier from your code.
Lets step through a slightly simplified version of the analyzer code:
var memberAccessExpression = (MemberAccessExpressionSyntax)ctx.Node.Parent;
var originalSymbolInfo = context.SemanticModel.GetSymbolInfo(
memberAccessExpression,
context.CancellationToken);
var statement = context.Node.FirstAncestorOrSelf<StatementSyntax>();
var annotation = new SyntaxAnnotation();
var speculationRoot = statement.ReplaceNode(
memberAccessExpression,
memberAccessExpression.Name.WithAdditionalAnnotations(annotation));
context.Node is a ThisExpressionSyntax. speculationRoot is an expression where we replaced this.SomeMember with SomeMember. The annotation is used for quick lookup later.
Now that we have generated a modified version of the code, we want to check if it would a) still compile, and b) still refer to the same thing. Since the sources in Roslyn are immutable, we'd need to compile the entire project again (which would be doable, but more expensive) to get a new SemanticModel for the changed code.
Here is where TryGetSpeculativeSemanticModel steps in:
if(!context.SemanticModel.TryGetSpeculativeSemanticModel(
statement.SpanStart,
speculationRoot,
out var speculativeModel)) return;
var mappedNode = speculationRoot.GetAnnotatedNodes(annotation).Single();
var newSymbolInfo = speculativeModel.GetSymbolInfo(mappedNode, context.CancellationToken);
So now if we manage to get a speculative semantic model, it means SomeMember was valid at that same position in code as this.SomeMember. We've used the annotation to quickly look up the SomeMember syntax node and then get it's semantic info from the speculativeModel.
All that's left to do now is to check whether the modified statement means the same thing as the original one.
if (!Equals(originalSymbolInfo.Symbol, newSymbolInfo.Symbol)) return;
context.ReportDiagnostic(Diagnostic.Create(Descriptor, context.Node.GetLocation()));

How to customise pragma in C?

I want to ask what is the simplest way to build a parser to recognise my customised pragmas in C/C++ code. Yes, a simple bash script can do but I am wondering if there is any formal way to do through Clang or LLVM? I tried to check Clang AST but I cannot find any pragmas.
For instance:
int foo1(){
#pragma hi k=1
...
}
int foo2(){
#pragma hello k=7
...
}
I want the pass returns the following:
function foo1 has hi and k=1
function foo2 has hello and k=7
Thanks.
Pragma handling needs to be done in the following parts:
Add a handler in ParsePragma.cpp file (there are examples how to do that). In this step you can parse the pragma token by token and store corresponding information in the AST (possibly the best way to transfer data from this stage to the later stages).
If you need to handle this information in the passes working on LLVM IR, then you need to attach the information previously stored into AST into an IR related classes, in your case it seems the llvm::Function is the place where to keep that. During this process it is needed to update 'lib/AsmParser/LLParser.cpp', 'lib/AsmParser/LLLexer.cpp' and 'lib/IR/AsmWriter.cpp' files. This will allow to read and write the information stored in IR.
Finally if you need to write extra information kept in IR into the assembler file then you will need to update 'lib/CodeGen/AsmPrinter/AsmPrinter.cpp' file correspondingly.

Machine Code based Control Flow Graph in LLVM

LLVM generally gives Control Flow Graphs (CFGs) for its intermediate representation (IR) language. You can also get high-level source-code-based CFGs with little effort. I want to get CFGs at the level of Machine Code. Is there any way to get this?
I did a little bit of digging around. In LLVM's back-end code generation phase, there's a stage called SSA-based Machine Code Optimizations. There's not much information on this stage. However, I guess LLVM generates a SSA-based machine code in some intermediate stage. If such a stage exists, then we can have Basic Blocks based on the code at that stage. With those Basic Blocks, a CFG could be created on that stage. Can anybody give any clue on the source-file that I have to look in the LLVM source tree (possibly in lib\CodeGen) to find any information regarding this? Or the class that would give me SSA-based Machine Code walk-through and Basic Blocks? I would appreciate any pointer.
I figured it out.
You need to write MachineFunctionPass for some target in lib\Target\<target architecture> folder.
Then in the runOnMachineFunction(MachineFunction &MF) function, you can view a CFG by calling the MF.viewCFG() function(in debug mode or with some tweaking inside the viewCFG to get CFG in Release mode as well).
You can access MachineBasicBlock and MachineInstr through the iterator over MF. Following is an example:
int i = 0;
for (auto &MBB : MF) {
errs() << "Basic Block: " << i++ << "\n\n";
for (auto &MI : MBB) {
MI.print(errs(), true, false);
errs() << "\n";
}
}

LLVM (3.5+) PassManager vs LegacyPassManager

I'm working on a new language using the LLVM C++ API and would like to take advantage of optimization passes. (Note: I'm currently using the latest from source LLVM which I believe equates to 3.8)
I have yet to find any examples that use the new PassManager and even Clang is still utilizing the LegacyPassManager.
I have come across posts such as this that are several years old now that mention the new PassManager, but they all still use the legacy system.
Is there any examples/tutorials on how to use this new(ish) PassManager? Should new LLVM projects prefer PassManager to LegacyPassManager? Does Clang plan on migrating or is this why the Legacy system has stuck around?
From what I've gathered with help from the #llvm IRC:
FunctionPassManager FPM;
//Use the PassInfoMixin types
FPM.addPass(InstCombinePass());
//Register any analysis passes that the transform passes might need
FunctionAnalysisManager FAM;
//Use the AnalysisInfoMixin types
FAM.registerPass([&] { return AssumptionAnalysis(); });
FAM.registerPass([&] { return DominatorTreeAnalysis(); });
FAM.registerPass([&] { return BasicAA(); });
FAM.registerPass([&] { return TargetLibraryAnalysis(); });
FPM.run(*myFunction, FAM);
But to avoid the hassle of manually registering each pass you can use PassBuilder to register the analysis passes
FunctionPassManager FPM;
FPM.addPass(InstCombinePass());
FunctionAnalysisManager FAM;
PassBuilder PB;
PB.registerFunctionAnalyses(FAM);
FPM.run(*myFunction, FAM);
Extending Lukes answer, with PassBuilder you can build predefined "out of box" simplification pipelines with different optimization levels:
llvm::FunctionAnalysisManager FAManager;
llvm::PassBuilder passBuilder;
passBuilder.registerFunctionAnalyses(FAManager);
passBuilder.buildFunctionSimplificationPipeline(
llvm::PassBuilder::OptimizationLevel::O2,
llvm::PassBuilder::ThinLTOPhase::None);
which will add a bunch of passes to FunctionAnalysisManager. This may simplify your life. The best place to see the full set of passes added for each OptimizationLevel is the original sources.