Why does my llvm function jit-evaluate to 0? - c++

I am playing with llvm (and antlr), working vaguely along the lines of the Kaleidoscope tutorial. I successfully created LLVM-IR code from basic arithmetic expressions both on top-level and as function definitions, which corresponds to the tutorial chapters up to 3.
Now I would like to incrementally add JIT support, starting with the top-level arithmetic expressions. Here is my problem:
Basic comparison makes it seem as if I follow the same sequence of function calls as the tutorial, only with a simpler code organization
The generated IR code looks good
The function definition is apparently found, since otherwise the code would exit (i verified this by intentionally looking for a wrongly spelled function name)
However the call of the function pointer created by JIT evaluation always returns zero.
These snippets (excerpt) are executed as part of the antlr visitor of the main/entry-node of my grammar:
//Top node main -- top level expression
antlrcpp::Any visitMain(ExprParser::MainContext *ctx)
{
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetAsmParser();
TheJIT = ExitOnErr( llvm::orc::KaleidoscopeJIT::Create() );
InitializeModuleAndPassManager();
// ... Code which visits the child nodes ...
}
InitializeModuleAndPassManager() is the same as in the tutorial:
static void InitializeModuleAndPassManager()
{
// Open a new context and module.
TheContext = std::make_unique<llvm::LLVMContext>();
TheModule = std::make_unique<llvm::Module>("commandline", *TheContext);
TheModule->setDataLayout(TheJIT->getDataLayout());
// Create a new builder for the module.
Builder = std::make_unique<llvm::IRBuilder<>>(*TheContext);
// Create a new pass manager attached to it.
TheFPM = std::make_unique<llvm::legacy::FunctionPassManager>(TheModule.get());
// Do simple "peephole" optimizations and bit-twiddling optzns.
TheFPM->add(llvm::createInstructionCombiningPass());
// Reassociate expressions.
TheFPM->add(llvm::createReassociatePass());
// Eliminate Common SubExpressions.
TheFPM->add(llvm::createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
TheFPM->add(llvm::createCFGSimplificationPass());
TheFPM->doInitialization();
}
This is the function which handles the top-level expression and which is also supposed to do JIT evaluation:
//Bare expression without function definition -- create anonymous function
antlrcpp::Any visitBareExpr(ExprParser::BareExprContext *ctx)
{
string fName = "__anon_expr";
llvm::FunctionType *FT = llvm::FunctionType::get(llvm::Type::getDoubleTy(*TheContext), false);
llvm::Function *F = llvm::Function::Create(FT, llvm::Function::ExternalLinkage, fName, TheModule.get());
llvm::BasicBlock *BB = llvm::BasicBlock::Create(*TheContext, "entry", F);
Builder->SetInsertPoint(BB);
llvm::Value* Expression=visit(ctx->expr()).as<llvm::Value* >();
Builder->CreateRet(Expression);
llvm::verifyFunction(*F);
//TheFPM->run(*F);//outcommented this because i wanted to try JIT before optimization-
//it causes a compile error right now because i probably lack some related code.
//However i do not assume that a missing optimization run will cause the problem that i have
F->print(llvm::errs());
// Create a ResourceTracker to track JIT'd memory allocated to our
// anonymous expression -- that way we can free it after executing.
auto RT = TheJIT->getMainJITDylib().createResourceTracker();
auto TSM = llvm::orc::ThreadSafeModule(move(TheModule), move(TheContext));
ExitOnErr(TheJIT->addModule(move(TSM), RT));
InitializeModuleAndPassManager();
// Search the JIT for the __anon_expr symbol.
auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
// Get the symbol's address and cast it to the right type (takes no
// arguments, returns a double) so we can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
double ret = FP();
fprintf(stderr, "Evaluated to %f\n", ret);
// Delete the anonymous expression module from the JIT.
ExitOnErr(RT->remove());
return F;
}
Now this is what happens as an example:
[robert#robert-ux330uak test4_expr_llvm_2]$ ./testmain '3*4'
define double #__anon_expr() {
entry:
ret float 1.200000e+01
}
Evaluated to 0.000000
I would be thankful for any ideas about what I might be doing wrong.

Related

How to Create a Load Instruction in LLVM, given Pointer to do the Load from?

I'm writing a LLVM pass where I create a function with integer pointer arguments. I need to implement the body of the function as well, and in order to do so, I need the integer values that are being pointed to. I am trying to create a load instruction that I can insert at the end of the basic block that contains the function body, but I am getting an error when doing so. Things compile fine, but when I run the pass, I get a generic error.
From what I've found about LLVM, there are usually ways to create instructions that don't involve using constructors, but I couldn't find a way to do that for load instructions, so I just used one of the constructors instead. After creating the function with arguments of the integer pointer type, this is the code that I'm using to do the load:
llvm::LoadInst operandLoad(llvm::Type::getInt32Ty(ctx), argList[0], "test", basicBlock);
If I comment the above line out, my pass runs fine. I'm unsure whether this is enough to diagnose the issue, so I'll include all the code to create the function (it's slightly simplified):
void createFunction(std::string functionName, llvm::LLVMContext &ctx, llvm::Module *module) {
std::vector<llvm::Type*> typeList = {llvm::Type::getInt32PtrTy(ctx)};
// Create function type
llvm::FunctionType *functionType = llvm::FunctionType::get(llvm::Type::getVoidTy(ctx), typeList, false);
// Create function
llvm::Function *function = llvm::Function::Create(functionType, llvm::Function::ExternalLinkage, functionName, module);
std::vector<llvm::Value*> argList;
for (llvm::Function::arg_iterator it = function->arg_begin(); it != function->arg_end(); ++it) {
argList.push_back(it);
}
llvm::BasicBlock *basicBlock = llvm::BasicBlock::Create(ctx, "entry", function);
llvm::LoadInst operandLoad(llvm::Type::getInt32Ty(ctx), argList[0], "test", basicBlock);
llvm::IRBuilder<> builder(ctx);
builder.SetInsertPoint(basicBlock);
builder.CreateRet(nullptr);
}
I'm sure this is a stupid question, so sorry about that. And thanks in advance!

LLVM asserts "Resolving symbol outside this responsibility set"

Why does my call to
jit->lookup("test");
hit a failed assert: "Resolving symbol outside this responsibility set"?
It does this when I create my function as:
define double #test() {
begin:
ret double 1.343000e+01
}
But it works fine (i.e., finds it without an assert) when I create the function as
define void #test() {
begin:
ret void
}
It is not a case of not finding the function "test", it has different behavior if I lookup a name that doesn't exist.
Here's the code that hits the assert:
ThreadSafeModule Create_M()
{
auto pCtx = make_unique<LLVMContext>();
LLVMContext& ctx = *pCtx;
auto pM = make_unique<Module>("myModule", ctx);
Module& M = *pM;
IRBuilder<> builder(ctx);
FunctionType* FT = FunctionType::get(Type::getDoubleTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRet(ConstantFP::get(ctx,APFloat(13.43)));
outs() << M; // For debugging
return ThreadSafeModule(std::move(pM), std::move(pCtx));
}
int main()
{
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
// Create an LLJIT instance.
auto jit = ExitOnErr(LLJITBuilder().create());
auto M1 = Create_M();
ExitOnErr(jit->addIRModule(std::move(M1)));
auto testSym = ExitOnErr(jit->lookup("test"));
}
Replace the function creation with these lines and it doesn't have the problem:
FunctionType* FT = FunctionType::get(Type::getVoidTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRetVoid();
I'd like to understand what the assert means, why it asserts in the one case and not the other, and what I need to do for the (*double)() case to get it to work. I did a lot of searching for documentation on LLVM responsibility sets, and found almost nothing. Some mention at https://llvm.org/docs/ORCv2.html, but not enough for me to interpret what it is telling me with this assert.
I'm using the SVN repository version of LLVM as of 20-Aug-2019, building on Visual Studio 2017 15.9.6.
To fix this error, add ObjectLinkingLayer.setAutoClaimResponsibilityForObjectSymbols(true);
Such as:
auto jit = ExitOnErr(LLJITBuilder()
.setJITTargetMachineBuilder(std::move(JTMB))
.setObjectLinkingLayerCreator([&](ExecutionSession &ES, const Triple &TT) {
auto ll = make_unique<ObjectLinkingLayer>(ES,
make_unique<jitlink::InProcessMemoryManager>());
ll->setAutoClaimResponsibilityForObjectSymbols(true);
return move(ll);
})
.create());
This was indeed bug in ORC LLJIT on Windows platform.
See bug record here:
https://bugs.llvm.org/show_bug.cgi?id=44337
Fix commit reference:
https://github.com/llvm/llvm-project/commit/84217ad66115cc31b184374a03c8333e4578996f
For anyone building custom JIT / compiler-layer stack by hand (not using LLJIT), all you need to do is force weak symbol autoclaim when emitting ELF images.
if (JTMB.getTargetTriple().isOSBinFormatCOFF())
{
ObjectLayer.setAutoClaimResponsibilityForObjectSymbols(true);
}
http://llvm.org/doxygen/classllvm_1_1orc_1_1ObjectLinkingLayer.html#aa30bc825696d7254aef0fe76015d10ff
If set, this ObjectLinkingLayer instance will claim responsibility for
any symbols provided by a given object file that were not already in
the MaterializationResponsibility instance.
Setting this flag allows higher-level program representations (e.g.
LLVM IR) to be added based on only a subset of the symbols they
provide, without having to write intervening layers to scan and add
the additional symbols. This trades diagnostic quality for convenience
however: If all symbols are enumerated up-front then clashes can be
detected and reported early (and usually deterministically). If this
option is set, clashes for the additional symbols may not be detected
until late, and detection may depend on the flow of control through
JIT'd code. Use with care.

Compiler: how to check a user function returns properly?

I am writing a very simple compiler where users are allowed to define functions that return either void, int or char. However, users' function may be malformed. They may not return a value for a function that does not return void, or return a value for a function that returns void as declared. Currently my compiler is unable to detect this kind of errors and fails to generate proper code for function that returns void as this class of functions can return without a return; (they return implicitly). These two problems have cost me quite some time to phrase them out clearly. See the example code below:
// Problem A: detect implicit return.
void Foo(int Arg) {
if (Arg)
return;
else {
Arg = 1;
// Foo returns here! How can I know!
}
}
// Problem B: detect "forgotten return".
int Bar(int Arg) {
if (Arg > 1) {
return 1;
}
// this is an error: control flow reaches end at non-void function!
// How can I know!
}
I think the more general question may be: how can I tell the control flow reaches end at some point in a function? By saying reach end I mean the it reaches a point after which the function has no code to execute. If I can detect the end of control flow, I can look for a return at this point and either report an error if the function ought to return something or generate an explicit return for a void function. If I enumerate all such points of a function, I can ensure that the function is fully checked or complemented.
I see this problem as a well-solved one in compiler engineering since modern C/C++ can do that pretty well. Is LLVM can offer any API to do this? Or is there simple algorithm to achieve this? Thanks very much.
Edit: I am currently using LLVM and have BasicBlock emitted already. I hope a guide in doing this in LLVM specifically.
Edit: In this question we assume that the return type declared in the function prototype always matches that of its return stmt. I primarily focus on the absence of a required return.
The answer is simple. After all BB's of a function are emitted, loop over them and pick up those ends without a Terminator (see the llvm document for what is a Terminator Instruction). Assuming the emission of all kinds of control flow statements (While, For, etc.) follows the rule (One BB is ended by one and only one Terminator), the only possible explanation of these rule-breakers is that they miss a Return IR in the end. If the current function return void, append a ReturnVoid to them. Otherwise, this is an error, report it.
The reasoning is largely correct as it rely on the well-formed property of LLVM's BB and it is easy to implement, cheap to run. Here is the code:
/// Generate body for a Function.
void visitFuncDef(FuncDef *FD) {
// Unrelated code omitted...
/// Generate the body
for (Stmt *S : FD->stmts) {
visitStmt(S);
}
/// Check for well-formness of all BBs. In particular, look for
/// any unterminated BB and try to add a Return to it.
for (BasicBlock &BB : *Fn) {
Instruction *Terminator = BB.getTerminator();
if (Terminator != nullptr) continue; /// Well-formed
if (Fn->getReturnType()->isVoidTy()) {
/// Make implicit return of void Function explicit.
Builder.SetInsertPoint(&BB);
Builder.CreateRetVoid();
} else {
// How to attach source location?
EM.Error("control flow reaches end of non-void function");
// No source location, make errors short
return;
}
}
/// Verify the function body
String ErrorMsg;
llvm::raw_string_ostream OS(ErrorMsg);
if (llvm::verifyFunction(*Fn, &OS)) {
EM.Error(ErrorMsg);
}
}

How can I get the list of function calls that are performed in each function of a program, from the intermediate representation of LLVM?

I am trying to build a simple version of a code analysis tool with LLVM.
I have a few .ll files which contain the intermediate LLVM representation of certain programs.
How can I get the list of function calls that are performed in each function of a program, from the intermediate representation of LLVM?
The input parameter I have is an instance of the LLVM: Module class which represents the program. Then, I get the list of functions present in the program with the function getFunctionList ().
void getFunctionCalls(const Module *M)
{
// Iterate functions in program
for (auto curFref = M->getFunctionList().begin(), endFref = M->getFunctionList().end();
curFref != endFref; ++curFref) {
// For each function
// Get list of function calls
}
}
This is a fragment from our working code here:
for (auto &module : Ctx.getModules()) {
auto &functionList = module->getModule()->getFunctionList();
for (auto &function : functionList) {
for (auto &bb : function) {
for (auto &instruction : bb) {
if (CallInst *callInst = dyn_cast<CallInst>(&instruction)) {
if (Function *calledFunction = callInst->getCalledFunction()) {
if (calledFunction->getName().startswith("llvm.dbg.declare")) {
Also keep in mind that there are also invoke instructions InvokeInst which may be obtained in a similar way.
Google CallInst vs InvokeInst and also learn about the functions with or without a called function. If a function does not have a called function this is indirect call. Indirect calls appear in LLVM IR when the source code instead of calling a function directly, calls a function pointer. In C++ this often happens when some class operates through an abstract interface (polymorphism). So keep in mind that it is not 100% always possible to trace a called function even though you have a call instruction in place.

What're all the LLVM layers for?

I'm playing with LLVM 3.7 and wanted to use the new ORC stuff. But I've been going at this for a few hours now and still don't get what the each layer is for, when to use them, how to compose them or at the very least the minimum set of things I need in place.
Been through the Kaleidoscope tutorial but these don't explain what the constituent parts are, just says put this here and this here (plus the parsing etc distracts from the core LLVM bits). While that's great to get started it leaves a lot of gaps. There are lots of docs on various things in LLVM but there's so much its actually bordering on overwhelming. Stuff like http://llvm.org/releases/3.7.0/docs/ProgrammersManual.html but I can't find anything that explains how all the pieces fit together. Even more confusing there seems to be multiple APIs for doing the same thing, thinking of the MCJIT and the newer ORC API. I saw Lang Hames post explaining, a fair few things seem to have changed since the patch he posted in that link.
So for a specific question, how do all these layers fit together?
When I previously used LLVM I could link to C functions fairly easily, using the "How to use JIT" example as a base, I tried linking to an externed function extern "C" double doIt but end up with LLVM ERROR: Tried to execute an unknown external function: doIt.
Having a look at this ORC example it seems I need to configure where it searches for the symbols. But TBH while I'm still swinging at this, its largely guess work. Here's what I got:
#include "llvm/ADT/STLExtras.h"
#include "llvm/ExecutionEngine/GenericValue.h"
#include "llvm/ExecutionEngine/Interpreter.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/raw_ostream.h"
#include "std.hpp"
using namespace llvm;
int main() {
InitializeNativeTarget();
LLVMContext Context;
// Create some module to put our function into it.
std::unique_ptr<Module> Owner = make_unique<Module>("test", Context);
Module *M = Owner.get();
// Create the add1 function entry and insert this entry into module M. The
// function will have a return type of "int" and take an argument of "int".
// The '0' terminates the list of argument types.
Function *Add1F = cast<Function>(M->getOrInsertFunction("add1", Type::getInt32Ty(Context), Type::getInt32Ty(Context), (Type *) 0));
// Add a basic block to the function. As before, it automatically inserts
// because of the last argument.
BasicBlock *BB = BasicBlock::Create(Context, "EntryBlock", Add1F);
// Create a basic block builder with default parameters. The builder will
// automatically append instructions to the basic block `BB'.
IRBuilder<> builder(BB);
// Get pointers to the constant `1'.
Value *One = builder.getInt32(1);
// Get pointers to the integer argument of the add1 function...
assert(Add1F->arg_begin() != Add1F->arg_end()); // Make sure there's an arg
Argument *ArgX = Add1F->arg_begin(); // Get the arg
ArgX->setName("AnArg"); // Give it a nice symbolic name for fun.
// Create the add instruction, inserting it into the end of BB.
Value *Add = builder.CreateAdd(One, ArgX);
// Create the return instruction and add it to the basic block
builder.CreateRet(Add);
// Now, function add1 is ready.
// Now we're going to create function `foo', which returns an int and takes no
// arguments.
Function *FooF = cast<Function>(M->getOrInsertFunction("foo", Type::getInt32Ty(Context), (Type *) 0));
// Add a basic block to the FooF function.
BB = BasicBlock::Create(Context, "EntryBlock", FooF);
// Tell the basic block builder to attach itself to the new basic block
builder.SetInsertPoint(BB);
// Get pointer to the constant `10'.
Value *Ten = builder.getInt32(10);
// Pass Ten to the call to Add1F
CallInst *Add1CallRes = builder.CreateCall(Add1F, Ten);
Add1CallRes->setTailCall(true);
// Create the return instruction and add it to the basic block.
builder.CreateRet(Add1CallRes);
std::vector<Type *> args;
args.push_back(Type::getDoubleTy(getGlobalContext()));
FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), args, false);
Function *F = Function::Create(FT, Function::ExternalLinkage, "doIt", Owner.get());
// Now we create the JIT.
ExecutionEngine *EE = EngineBuilder(std::move(Owner)).create();
outs() << "We just constructed this LLVM module:\n\n" << *M;
outs() << "\n\nRunning foo: ";
outs().flush();
// Call the `foo' function with no arguments:
std::vector<GenericValue> noargs;
GenericValue gv = EE->runFunction(FooF, noargs);
auto ax = EE->runFunction(F, noargs);
// Import result of execution:
outs() << "Result: " << gv.IntVal << "\n";
outs() << "Result 2: " << ax.IntVal << "\n";
delete EE;
llvm_shutdown();
return 0;
}
doIt is declared in std.hpp.
Your question is very vague, but maybe I can help a bit. This code sample is a simple JIT built with Orc - it's well commented so it should be easy to follow.
Put simply, Orc builds on top of the same building blocks used by MCJIT (MC for compiling LLVM modules down to object files, RuntimeDyld for the dynamic linking at runtime), but provides more flexibility with its concept of layers. It can thus support things like "lazy" JIT compilation, which MCJIT doesn't support. This is important for the LLVM community because the "old JIT" that was removed not very long ago supported these things. Orc JIT lets us gain back these advanced JIT capabilities while still building on top of MC and thus not duplicating the code emission logic.
To get better answers, I suggest you ask more specific questions.