Let x be the address of a global variable g in a program at run-time. LLVM IR produces a store instruction as shown below:
store i32 30, i32* #g, align 4
I am writing an LLVM pass which will instrument the program such that x is passed to an instrumentation function func(int addr) at run-time. I can insert a call to func using IRBuilder successfully. What I am not being able to do is to insert instrumentation to collect x.
if (StoreInst *store_inst = dyn_cast<StoreInst>(&I)) {
Value* po = store_inst->getPointerOperand();
if(isa<GlobalVariable>(po)) {
errs() << "store [pointer]: " << *po << '\n';
Constant *instrument_func = F.getParent()->getOrInsertFunction("func", Type::getVoidTy(Ctx), Type::getInt32Ty(Ctx), NULL);
IRBuilder<> builder(&I);
builder.SetInsertPoint(&B, ++builder.GetInsertPoint());
Value* args[] = {po};
builder.CreateCall(instrument_func, args);
}
}
The result of running opt is :
Call parameter type does not match function signature!
i32* #a
i32 call void bitcast (void (i64)* #func to void (i32)*)(i32* #a)
LLVM ERROR: Broken function found, compilation aborted!
Two possibilities either you are passing parameters of different type or of different size.
I am not quite sure but try this:
if (StoreInst *store_inst = dyn_cast<StoreInst>(&I)) {
Value* po = store_inst->getPointerOperand();
if(isa<GlobalVariable>(po)) {
errs() << "store [pointer]: " << *po << '\n';
Constant *instrument_func = F.getParent()->getOrInsertFunction("func", Type::getVoidTy(Ctx), Type::getInt32Ty(Ctx), NULL);
IRBuilder<> builder(&I);
builder.SetInsertPoint(&B, ++builder.GetInsertPoint());
std::vector<Value *> args;
args.push_back(po);
builder.CreateCall(instrument_func, args);
}
}
Related
I am new to LLVM, I am trying to write an LLVM transformation pass that will inject a delay to the beginning of each called function at run time.
I found the following code that injects a printf statement to the beginning of each function.
How can i change the code accordingly to inject a delay instead of the printf? (I am using LLVM 10.)
Below is the code:
bool InjectFuncCall::runOnModule(Module &M) {
bool InsertedAtLeastOnePrintf = false;
auto &CTX = M.getContext();
PointerType *PrintfArgTy = PointerType::getUnqual(Type::getInt8Ty(CTX));
// STEP 1: Inject the declaration of printf
// ----------------------------------------
// Create (or _get_ in cases where it's already available) the following
// declaration in the IR module:
// declare i32 #printf(i8*, ...)
// It corresponds to the following C declaration:
// int printf(char *, ...)
FunctionType *PrintfTy = FunctionType::get(
IntegerType::getInt32Ty(CTX),
PrintfArgTy,
/*IsVarArgs=*/true);
FunctionCallee Printf = M.getOrInsertFunction("printf", PrintfTy);
// Set attributes as per inferLibFuncAttributes in BuildLibCalls.cpp
Function *PrintfF = dyn_cast<Function>(Printf.getCallee());
PrintfF->setDoesNotThrow();
PrintfF->addParamAttr(0, Attribute::NoCapture);
PrintfF->addParamAttr(0, Attribute::ReadOnly);
// STEP 2: Inject a global variable that will hold the printf format string
// ------------------------------------------------------------------------
llvm::Constant *PrintfFormatStr = llvm::ConstantDataArray::getString(
CTX, "(llvm-tutor) Hello from: %s\n(llvm-tutor) number of arguments: %d\n");
Constant *PrintfFormatStrVar =
M.getOrInsertGlobal("PrintfFormatStr", PrintfFormatStr->getType());
dyn_cast<GlobalVariable>(PrintfFormatStrVar)->setInitializer(PrintfFormatStr);
// STEP 3: For each function in the module, inject a call to printf
// ----------------------------------------------------------------
for (auto &F : M) {
if (F.isDeclaration())
continue;
// Get an IR builder. Sets the insertion point to the top of the function
IRBuilder<> Builder(&*F.getEntryBlock().getFirstInsertionPt());
// Inject a global variable that contains the function name
auto FuncName = Builder.CreateGlobalStringPtr(F.getName());
// Printf requires i8*, but PrintfFormatStrVar is an array: [n x i8]. Add
// a cast: [n x i8] -> i8*
llvm::Value *FormatStrPtr =
Builder.CreatePointerCast(PrintfFormatStrVar, PrintfArgTy, "formatStr");
// The following is visible only if you pass -debug on the command line
// *and* you have an assert build.
LLVM_DEBUG(dbgs() << " Injecting call to printf inside " << F.getName()
<< "\n");
// Finally, inject a call to printf
Builder.CreateCall(
Printf, {FormatStrPtr, FuncName, Builder.getInt32(F.arg_size())});
InsertedAtLeastOnePrintf = true;
}
return InsertedAtLeastOnePrintf;
}
Also it would be great if there are links for good LLVM tutorials for beginners.
You'll have to declare the delay function the same way you declared printf except you'll want to change the argument type from i8* to i32. For the tutorials you could check these out
https://anoopsarkar.github.io/compilers-class/llvm-practice.html\
https://www.usna.edu/Users/cs/wcbrown/courses/F19SI413/lab/l13/lab.html
https://osterlund.xyz/posts/2017-11-28-LLVM-pass.html
I am trying to retrieve the name of the pointer passed to a cudaMalloc call.
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
errs() << AI->getName() << "\n";
}
The above however just prints an empty line. Is is possible to get the pointer name out of this alloca?
This is the relevant IR:
%28 = alloca i8*, align 8
...
...
call void #llvm.dbg.declare(metadata i8** %28, metadata !926, metadata !DIExpression()), !dbg !927
%257 = call i32 #cudaMalloc(i8** %28, i64 1), !dbg !928
...
...
!926 = !DILocalVariable(name: "d_over", scope: !677, file: !3, line: 191, type: !22)
!927 = !DILocation(line: 191, column: 10, scope: !677)
Answering my own question. It turns out that there is an llvm.dbg.declare call (DbgDeclareInst) corresponding to the alloca but it may appear anywhere in the caller function's basic blocks. Probably it comes after the first use of this Alloca value? Not sure. In any case, my solution is to search for DbgDeclareInst instructions, check if it is for an AllocaInst and if so compare that alloca with the alloca of interest and if equal get the variable name. Something like this:
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
if ( !AI->hasName() ) {
// Function this AllocaInst belongs
Function *Caller = AI->getParent()->getParent();
// Search for llvm.dbg.declare
for ( BasicBlock& BB : *Caller)
for (Instruction &I : BB) {
if ( DbgDeclareInst *dbg = dyn_cast<DbgDeclareInst>(&I))
// found. is it for an AllocaInst?
if ( AllocaInst *dbgAI = dyn_cast<AllocaInst>(dbg->getAddress()))
// is it for our AllocaInst?
if (dbgAI == AI)
if (DILocalVariable *varMD = dbg->getVariable()) // probably not needed?
errs() << varMD->getName() << "\n";
} else {
errs() << AI->getName() << "\n";
}
}
I'am writing a LLVM pass and I need to find every instruction that could have defined the memory read by a load instruction. E.g.:
%x = alloca i32, align 4
store i32 123, i32* %x, align 4
%0 = load i32, i32* %x, align 4
In this example I want to get from the load instruction to every instruction that could have initialized/altered %x. In this case just the previous store instruction. I tried to use the use-def chain, but this gives me the instruction for the definition of the memory, which is the alloca instruction.
bool runOnModule(Module &M) override {
for(Function &fun : M) {
for(BasicBlock &bb : fun) {
for(Instruction &instr : bb) {
if(isa<LoadInst>(instr)){
for (Use &U : instr.operands()) {
if (Instruction *Inst = dyn_cast<Instruction>(U)) {
errs() << *Inst << "\n";
}
}
}
}
}
}
return false;
}
};
How can I get every possible store instructions that could have defined the memory read by a load instruction?
you can cast the AllocaInst to Value and then check its uses, if they are loads or stores.
Just for side note:
Value is superclass Value <-- User <-- Instruction <-- UnaryInst <-- AllocaInst, you can also look at the Inheritance diagram at http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html#details
Value* val = cast<Value>(alloca_x);
Value::use_iterator sUse = val->use_begin();
Value::use_iterator sEnd = val->use_end();
for (; sUse != sEnd; ++sUse) {
if(isa<LoadInst>(sUse)) // load inst
else if(isa<StoreInst>(sUse)) // store inst
}
There is also memory dependency analysis pass which in turns uses alias analysis, you can query store instruction and it will return the instructions which loads from or store to that memory. see http://llvm.org/docs/doxygen/html/classllvm_1_1MemoryDependenceAnalysis.html for more information.
I have written a pass to detect and print the label of basicblocks in a function, for I want to use splitBasicBlock() further. I wrote that like this:
virtual bool runOnModule(Module &M)
{
for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F)
{
errs()<<"Function:"<<F->getName()<<"\n";
//for(Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
for (iplist<BasicBlock>::iterator iter = F->getBasicBlockList().begin();
iter != F->getBasicBlockList().end();
iter++)
{
BasicBlock* currBB = iter;
errs() << "BasicBlock: " << currBB->getName() << "\n";
}
}
return true;
}
IR file looks like this:
; <label>:63 ; preds = %43
%64 = load i32* %j, align 4
%65 = sext i32 %64 to i64
%66 = load i8** %tempdst, align 8
%67 = getelementptr inbounds i8* %66, i64 %65
store i8 -1, i8* %67, align 1
br label %73
; <label>:68 ; preds = %43
%69 = load i32* %j, align 4
%70 = sext i32 %69 to i64
%71 = load i8** %tempdst, align 8
%72 = getelementptr inbounds i8* %71, i64 %70
store i8 0, i8* %72, align 1
br label %73
; <label>:73 ; preds = %68, %63
br label %74
However, I got nothing about the label:
Function:main
BasicBlock:
BasicBlock:
BasicBlock:
What's wrong with these "unnamed" basic block? What should I do?
While BasicBlocks may be with no name (as indicated by hasName() method) one may print unique BasicBlock identifier by using currBB->printAsOperand(errs(), false) instead of streaming into errs() the value of currBB->getName(). For unnamed BasicBlock this would provide the numerical basic block representation, such as %68 .
Values in LLVM IR are not required to have a name; and indeed, those basic blocks don't have names, which is why you get an empty string from currBB->getName().
The reason that they have names in the LLVM IR printout is because when you print to the textual format of LLVM IR (as it appears in .ll files), you have to assign a name to them to make them referable, so the printer assigns sequential numeric names to basic blocks (and other values). Those numeric names are only created by the printer, though, and don't actually exist in the module.
While compiling source code to bitcode using clang use the below flag
-fno-discard-value-names
You will get the name of basic block as a unique string
I think the behavior of LLVM now is different.
I use similar lines of code and can get the label's name on LLVM-4.0
for (auto &funct : m) {
for (auto &basic_block : funct) {
StringRef bbName(basic_block.getName());
errs() << "BasicBlock: " << bbName << "\n";
}
}
As ElazarR said, currBB->printAsOperand(errs(), false) will print such ID in the error stream, but it is possible to store it in a string as well if this is more interesting to your logic.
In the LLVM CFG generation pass -dot-cfg, they always name the basic block using the BB's name (if any) or its representation as a string. This logic is present in the CFGPrinter.h header (http://llvm.org/doxygen/CFGPrinter_8h_source.html#l00063):
static std::string getSimpleNodeLabel(const BasicBlock *Node,
const Function *) {
if (!Node->getName().empty())
return Node->getName().str();
std::string Str;
raw_string_ostream OS(Str);
Node->printAsOperand(OS, false);
return OS.str();
}
You can use this logic to always return a valid name for the basic block.
I i'm trying to create a exception handler inside JIT llvm code. the current documentation regarding exception handling in LLVM is very handwavy at the moment, so i've been trying to reuse most of the snippets i get from http://llvm.org/demo in order to get a working example, but i'm not sure if those are up to date with llvm 2.9 (the version i am using).
This is what the module looks after Module::dump();
; ModuleID = 'testModule'
declare i32 #myfunc()
define i32 #test_function_that_invokes_another() {
entryBlock:
%0 = alloca i8*
%1 = alloca i32
%someName = invoke i32 #myfunc()
to label %exitBlock unwind label %unwindBlock
exitBlock: ; preds = %entryBlock
ret i32 1
unwindBlock: ; preds = %entryBlock
%2 = call i8* #llvm.eh.exception()
store i8* %2, i8** %0
%3 = call i32 (i8*, i8*, ...)* #llvm.eh.selector(i8* %2, i8* bitcast (i32 (...)* #__gxx_personality_v0 to i8*), i8* null)
store i32 1, i32* %1
%4 = load i8** %0
%5 = call i32 (...)* #__cxa_begin_catch(i8* %4) nounwind
%cleanup_call = call i32 #myCleanup()
%6 = call i32 (...)* #__cxa_end_catch()
ret i32 1
}
declare i32 #__gxx_personality_v0(...)
declare i32 #__cxa_begin_catch(...)
declare i32 #__cxa_end_catch(...)
declare i8* #llvm.eh.exception() nounwind readonly
declare i32 #llvm.eh.selector(i8*, i8*, ...) nounwind
declare i32 #myCleanup()
and this is what happens when i try to execute the function:
inside JIT calling C/C++ call
terminate called after throwing an instance of 'int'
Aborted
this shows that the function that throws gets called, it throws, but i never land in the cleanup call. (my cleanup call should have said 'inside JIT calling C/C++ Cleanup')
The function that invokes and (attempts) to catch a thrown exception is:
const inline llvm::FunctionType* getTestFunctionSignature(llvm::LLVMContext& context) {
return llvm::TypeBuilder< unsigned int(), false > ::get(context);
}
llvm::Function* createFunctionThatInvokesAnother( llvm::LLVMContext& ctx, llvm::Module* mod , llvm::Function* another ) {
llvm::Function* result = llvm::Function::Create(getTestFunctionSignature(ctx),
llvm::GlobalValue::ExternalLinkage,
"test_function_that_invokes_another",
mod);
llvm::BasicBlock* entry_block = llvm::BasicBlock::Create(ctx, "entryBlock", result);
llvm::BasicBlock* exit_block = llvm::BasicBlock::Create(ctx, "exitBlock", result);
llvm::BasicBlock* unwind_block = llvm::BasicBlock::Create(ctx, "unwindBlock", result);
llvm::IRBuilder<> builder(entry_block);
llvm::ConstantInt* ci = llvm::ConstantInt::get( mod->getContext() , llvm::APInt( 32 , llvm::StringRef("1"), 10));
llvm::PointerType* pty3 = llvm::PointerType::get(llvm::IntegerType::get(mod->getContext(), 8), 0);
llvm::AllocaInst* ptr_24 = new llvm::AllocaInst(pty3, "", entry_block);
llvm::AllocaInst* ptr_25 = new llvm::AllocaInst(llvm::IntegerType::get(mod->getContext(), 32), "", entry_block);
llvm::Twine name("someName");
builder.CreateInvoke( another , exit_block , unwind_block , "someName" );
builder.SetInsertPoint( exit_block );
builder.CreateRet(ci);
builder.SetInsertPoint( unwind_block );
llvm::Function* func___gxx_personality_v0 = func__gxx_personality_v0(mod);
llvm::Function* func___cxa_begin_catch = func__cxa_begin_catch(mod);
llvm::Function* func___cxa_end_catch = func__cxa_end_catch(mod);
llvm::Function* func_eh_ex = func_llvm_eh_exception(mod);
llvm::Function* func_eh_sel = func__llvm_eh_selector(mod);
llvm::Constant* const_ptr_17 = llvm::ConstantExpr::getCast(llvm::Instruction::BitCast, func___gxx_personality_v0, pty3);
llvm::ConstantPointerNull* const_ptr_18 = llvm::ConstantPointerNull::get(pty3);
llvm::CallInst* get_ex = llvm::CallInst::Create(func_eh_ex, "", unwind_block);
get_ex->setCallingConv(llvm::CallingConv::C);
get_ex->setTailCall(false);
new llvm::StoreInst(get_ex, ptr_24, false, unwind_block);
std::vector<llvm::Value*> int32_37_params;
int32_37_params.push_back(get_ex);
int32_37_params.push_back(const_ptr_17);
int32_37_params.push_back(const_ptr_18);
llvm::CallInst* eh_sel = llvm::CallInst::Create(func_eh_sel, int32_37_params.begin(), int32_37_params.end(), "", unwind_block);
eh_sel->setCallingConv(llvm::CallingConv::C);
eh_sel->setTailCall(false);
new llvm::StoreInst(ci, ptr_25, false, unwind_block);
llvm::LoadInst* ptr_29 = new llvm::LoadInst(ptr_24, "", false, unwind_block);
llvm::CallInst* ptr_30 = llvm::CallInst::Create(func___cxa_begin_catch, ptr_29, "", unwind_block);
ptr_30->setCallingConv(llvm::CallingConv::C);
ptr_30->setTailCall(false);
llvm::AttrListPtr ptr_30_PAL;
{
llvm::SmallVector<llvm::AttributeWithIndex, 4 > Attrs;
llvm::AttributeWithIndex PAWI;
PAWI.Index = 4294967295U;
PAWI.Attrs = 0 | llvm::Attribute::NoUnwind;
Attrs.push_back(PAWI);
ptr_30_PAL = llvm::AttrListPtr::get(Attrs.begin(), Attrs.end());
}
ptr_30->setAttributes(ptr_30_PAL);
llvm::Function* cleanup = call_myCleanup( mod );
builder.CreateCall( cleanup , "cleanup_call");
llvm::CallInst* end_catch = llvm::CallInst::Create(func___cxa_end_catch, "", unwind_block);
builder.CreateRet(ci);
//createCatchHandler( mod , unwind_block );
return result;
}
This gets called like the usual business:
testMain() {
llvm::LLVMContext ctx;
llvm::InitializeNativeTarget();
llvm::StringRef idRef("testModule");
llvm::Module* module = new llvm::Module(idRef, ctx);
std::string jitErrorString;
llvm::ExecutionEngine* execEngine = executionEngine( module , jitErrorString );
llvm::FunctionPassManager* OurFPM = new llvm::FunctionPassManager(module);
llvm::Function *thr = call_my_func_that_throws( module );
llvm::Function* result = createFunctionThatInvokesAnother(ctx, module ,thr);
std::string errorInfo;
llvm::verifyModule(* module, llvm::PrintMessageAction, & errorInfo);
module->dump();
void *fptr = execEngine->getPointerToFunction(result);
unsigned int (*fp)() = (unsigned int (*)())fptr;
try {
unsigned int value = fp();
} catch (...) {
std::cout << " handled a throw from JIT function" << std::endl;
}
}
where my function that throws is:
int myfunc() {
std::cout << " inside JIT calling C/C++ call" << std::endl;
throw 0;
};
llvm::Function* call_my_func_that_throws (llvm::Module* mod) {
std::vector< const llvm::Type* > FuncTy_ex_args;
llvm::FunctionType* FuncTy_ex = llvm::FunctionType::get( llvm::IntegerType::get( mod->getContext() , 32) , FuncTy_ex_args , false);
llvm::Function* result = llvm::Function::Create(FuncTy_ex, llvm::GlobalValue::ExternalLinkage, "myfunc", mod);
result->setCallingConv( llvm::CallingConv::C );
llvm::AttrListPtr PAL;
result->setAttributes( PAL );
llvm::sys::DynamicLibrary::AddSymbol( "myfunc" , (void*) &myfunc );
return result;
}
and my cleanup function is defined in a similar way:
int myCleanup() {
std::cout << " inside JIT calling C/C++ Cleanup" << std::endl;
return 18;
};
llvm::Function* call_myCleanup (llvm::Module* mod) {
std::vector< const llvm::Type* > FuncTy_ex_args;
llvm::FunctionType* FuncTy_ex = llvm::FunctionType::get( llvm::IntegerType::get( mod->getContext() , 32) , FuncTy_ex_args , false);
llvm::Function* result = llvm::Function::Create(FuncTy_ex, llvm::GlobalValue::ExternalLinkage, "myCleanup", mod);
result->setCallingConv( llvm::CallingConv::C );
llvm::AttrListPtr PAL;
result->setAttributes( PAL );
llvm::sys::DynamicLibrary::AddSymbol( "myCleanup" , (void*) &myCleanup );
return result;
}
I've also read this document regarding recent exception handling changes in LLVM, but is not clear how those changes translate to actual, you know, code
Right now the EH code is undergoing a large amount of revision. The demo, if I recall correctly, is not version 2.9, but current development sources - meaning trying to do something with 2.9 is going to be a world of hurt if you try that way.
That said, the EH representation is much better now and numerous patches have gone in to improve the documentation just this week. If you are trying to write a language that uses exceptions via llvm I highly suggest you migrate your code to current development sources.
All of that said, I'm not sure how well exception handling works in the JIT at all right now. It's nominally supported, but you may need to debug the unwind tables that are put into memory to make sure they're correct.