I'am writing a LLVM pass and I need to find every instruction that could have defined the memory read by a load instruction. E.g.:
%x = alloca i32, align 4
store i32 123, i32* %x, align 4
%0 = load i32, i32* %x, align 4
In this example I want to get from the load instruction to every instruction that could have initialized/altered %x. In this case just the previous store instruction. I tried to use the use-def chain, but this gives me the instruction for the definition of the memory, which is the alloca instruction.
bool runOnModule(Module &M) override {
for(Function &fun : M) {
for(BasicBlock &bb : fun) {
for(Instruction &instr : bb) {
if(isa<LoadInst>(instr)){
for (Use &U : instr.operands()) {
if (Instruction *Inst = dyn_cast<Instruction>(U)) {
errs() << *Inst << "\n";
}
}
}
}
}
}
return false;
}
};
How can I get every possible store instructions that could have defined the memory read by a load instruction?
you can cast the AllocaInst to Value and then check its uses, if they are loads or stores.
Just for side note:
Value is superclass Value <-- User <-- Instruction <-- UnaryInst <-- AllocaInst, you can also look at the Inheritance diagram at http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html#details
Value* val = cast<Value>(alloca_x);
Value::use_iterator sUse = val->use_begin();
Value::use_iterator sEnd = val->use_end();
for (; sUse != sEnd; ++sUse) {
if(isa<LoadInst>(sUse)) // load inst
else if(isa<StoreInst>(sUse)) // store inst
}
There is also memory dependency analysis pass which in turns uses alias analysis, you can query store instruction and it will return the instructions which loads from or store to that memory. see http://llvm.org/docs/doxygen/html/classllvm_1_1MemoryDependenceAnalysis.html for more information.
Related
I am building a lifter that translates assembly code into LLVM IR. I was wondering if there is a possible way to check the data stored inside an LLVM variable. For example in my code below. I am creating a dummy LLVM function. Inside my function, I have just one basic block where I allocate memory for a single variable SRC and then I store an immediate value of 31 inside that allocated memory. The last step is I loaded from that memory into a variable called loaded.
Is there a way to check that the value of the %loaded variable is in fact 31 ?.
int main()
{
llvm::LLVMContext context;
llvm::Type* type = llvm::Type::getVoidTy(context);
Module* modu = new Module("test", context);
modu->getOrInsertFunction("dummy",type);
Function* dummy = modu->getFunction("dummy");
BasicBlock* block = BasicBlock::Create(context, "entry", dummy);
IRBuilder<> builder(block);
llvm::Value* SRC = builder.CreateAlloca(Type::getInt32Ty(context), nullptr);
llvm::Value* s = builder.CreateStore(llvm::ConstantInt::get(context, llvm::APInt(/*nbits*/32, 31, true)), SRC, /*isVolatile=*/false);
llvm::Value* loaded = builder.CreateLoad(SRC, "loaded");
builder.CreateRetVoid();
PassManager <llvm::Module>PM;
llvm::AnalysisManager <llvm::Module>AM;
verifyFunction(*(modu->getFunction("dummy")), &llvm::errs());
verifyModule(*modu, &llvm::errs());
PassBuilder PB;
PB.registerModuleAnalyses(AM);
PM.addPass(PrintModulePass());
PM.run(*modu, AM);
The output of my code looks like this:
; ModuleID = 'test'
source_filename = "test"
define void #dummy() {
entry:
%0 = alloca i32, align 4
store i32 31, i32* %0, align 4
%loaded = load i32, i32* %0, align 4
ret void
}
You can insert a call to printf and compile this IR into a native executable. Running it will print out the variable value.
Alternatively, you can run lli on this IR under debugger and break on load handler.
I am trying to retrieve the name of the pointer passed to a cudaMalloc call.
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
errs() << AI->getName() << "\n";
}
The above however just prints an empty line. Is is possible to get the pointer name out of this alloca?
This is the relevant IR:
%28 = alloca i8*, align 8
...
...
call void #llvm.dbg.declare(metadata i8** %28, metadata !926, metadata !DIExpression()), !dbg !927
%257 = call i32 #cudaMalloc(i8** %28, i64 1), !dbg !928
...
...
!926 = !DILocalVariable(name: "d_over", scope: !677, file: !3, line: 191, type: !22)
!927 = !DILocation(line: 191, column: 10, scope: !677)
Answering my own question. It turns out that there is an llvm.dbg.declare call (DbgDeclareInst) corresponding to the alloca but it may appear anywhere in the caller function's basic blocks. Probably it comes after the first use of this Alloca value? Not sure. In any case, my solution is to search for DbgDeclareInst instructions, check if it is for an AllocaInst and if so compare that alloca with the alloca of interest and if equal get the variable name. Something like this:
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
if ( !AI->hasName() ) {
// Function this AllocaInst belongs
Function *Caller = AI->getParent()->getParent();
// Search for llvm.dbg.declare
for ( BasicBlock& BB : *Caller)
for (Instruction &I : BB) {
if ( DbgDeclareInst *dbg = dyn_cast<DbgDeclareInst>(&I))
// found. is it for an AllocaInst?
if ( AllocaInst *dbgAI = dyn_cast<AllocaInst>(dbg->getAddress()))
// is it for our AllocaInst?
if (dbgAI == AI)
if (DILocalVariable *varMD = dbg->getVariable()) // probably not needed?
errs() << varMD->getName() << "\n";
} else {
errs() << AI->getName() << "\n";
}
}
I want to print the exact number of the debug information in IR, how could I do it?
For example, consider an IR chunk as below,
call void #llvm.dbg.declare(metadata i32* %a, metadata !10, metadata !11), !dbg !12!
!12 = !DILocation(line: 19, column: 7, scope: !6)
I want to print the !12 as a string for debugging purpose. I can acquire the object of DILocation by doing
Instruction::getDebugLoc()->get()
but all I get is a pointer and there is no such interface for acquiring the number. I can assume that LLVM gives the number when it is actually generating the bitcode, since dumping the DILocation gives a result something like
<0x7342628> = !DILocation(line: 23, column: 3, scope: <0x733e5f8>)
this. But when I use Instruction::dump(), it gives me something that looks like
call void #llvm.dbg.declare(metadata i32* %a, metadata !10, metadata !11), !dbg !12
this, So I am confused whether it has the numbering information of a debug-info or not during runtime.
Does it have the numbering information or not? If so, how can I acquire that info? If not, where should I inspect to look for the generation of the bitcode in LLVM?
Are you talking about line/column numbers? If so, then you can easily access them directly from the debugLoc:
instruction->getDebugLoc()->getLine()
instruction->getDebugLoc()->getColumn()
See the definition at DebugInfoMetadata:
unsigned getLine() const { return SubclassData32; }
unsigned getColumn() const { return SubclassData16; }
It is probably not too late to answer this question.
if (instruction->hasMetadata()) {
instruction->dump();
// one way
SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
instruction->getAllMetadata(MDs);
for (auto &MD : MDs) {
if (MDNode *N = MD.second) {
N->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
}
}
// second way
instruction->getDebugLoc()->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
// third way
int debugInfoKindID = 0;
MDNode *debug = instruction->getMetadata(debugInfoKindID);
debug->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
}
Output is:
%11 = add nsw i32 %9, %10, !dbg !29
!29
!29
!29
I found this by looking at llvm//unittests/IR/MetadataTest.cpp, its test TEST_F(MDNodeTest, PrintFromMetadataAsValue).
I have written a pass to detect and print the label of basicblocks in a function, for I want to use splitBasicBlock() further. I wrote that like this:
virtual bool runOnModule(Module &M)
{
for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F)
{
errs()<<"Function:"<<F->getName()<<"\n";
//for(Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
for (iplist<BasicBlock>::iterator iter = F->getBasicBlockList().begin();
iter != F->getBasicBlockList().end();
iter++)
{
BasicBlock* currBB = iter;
errs() << "BasicBlock: " << currBB->getName() << "\n";
}
}
return true;
}
IR file looks like this:
; <label>:63 ; preds = %43
%64 = load i32* %j, align 4
%65 = sext i32 %64 to i64
%66 = load i8** %tempdst, align 8
%67 = getelementptr inbounds i8* %66, i64 %65
store i8 -1, i8* %67, align 1
br label %73
; <label>:68 ; preds = %43
%69 = load i32* %j, align 4
%70 = sext i32 %69 to i64
%71 = load i8** %tempdst, align 8
%72 = getelementptr inbounds i8* %71, i64 %70
store i8 0, i8* %72, align 1
br label %73
; <label>:73 ; preds = %68, %63
br label %74
However, I got nothing about the label:
Function:main
BasicBlock:
BasicBlock:
BasicBlock:
What's wrong with these "unnamed" basic block? What should I do?
While BasicBlocks may be with no name (as indicated by hasName() method) one may print unique BasicBlock identifier by using currBB->printAsOperand(errs(), false) instead of streaming into errs() the value of currBB->getName(). For unnamed BasicBlock this would provide the numerical basic block representation, such as %68 .
Values in LLVM IR are not required to have a name; and indeed, those basic blocks don't have names, which is why you get an empty string from currBB->getName().
The reason that they have names in the LLVM IR printout is because when you print to the textual format of LLVM IR (as it appears in .ll files), you have to assign a name to them to make them referable, so the printer assigns sequential numeric names to basic blocks (and other values). Those numeric names are only created by the printer, though, and don't actually exist in the module.
While compiling source code to bitcode using clang use the below flag
-fno-discard-value-names
You will get the name of basic block as a unique string
I think the behavior of LLVM now is different.
I use similar lines of code and can get the label's name on LLVM-4.0
for (auto &funct : m) {
for (auto &basic_block : funct) {
StringRef bbName(basic_block.getName());
errs() << "BasicBlock: " << bbName << "\n";
}
}
As ElazarR said, currBB->printAsOperand(errs(), false) will print such ID in the error stream, but it is possible to store it in a string as well if this is more interesting to your logic.
In the LLVM CFG generation pass -dot-cfg, they always name the basic block using the BB's name (if any) or its representation as a string. This logic is present in the CFGPrinter.h header (http://llvm.org/doxygen/CFGPrinter_8h_source.html#l00063):
static std::string getSimpleNodeLabel(const BasicBlock *Node,
const Function *) {
if (!Node->getName().empty())
return Node->getName().str();
std::string Str;
raw_string_ostream OS(Str);
Node->printAsOperand(OS, false);
return OS.str();
}
You can use this logic to always return a valid name for the basic block.
I have little example code in C++:
struct RecordTest
{
int value1;
int value2;
};
void test()
{
RecordTest rt;
rt.value1 = 15;
rt.value2 = 75;
}
and LLVM 3.4 IR for it:
%struct.RecordTest = type { i32, i32 }
; Function Attrs: nounwind
define void #_Z4testv() #0 {
entry:
%rt = alloca %struct.RecordTest, align 4
%value1 = getelementptr inbounds %struct.RecordTest* %rt, i32 0, i32 0
store i32 15, i32* %value1, align 4
%value2 = getelementptr inbounds %struct.RecordTest* %rt, i32 0, i32 1
store i32 75, i32* %value2, align 4
ret void
}
and a pretty easy question: How can I access to RecordTest fields (when I parsing .cpp), without their indexes, with only names (value1 and value2)?
I know only one way (from llc -march=cpp) - with indexes:
AllocaInst* ptr_rt = new AllocaInst(StructTy_struct_RecordTest, "rt", label_entry);
ptr_rt->setAlignment(4);
std::vector<Value*> ptr_value1_indices;
ptr_value1_indices.push_back(const_int32_6);
ptr_value1_indices.push_back(const_int32_6);
Instruction* ptr_value1 = GetElementPtrInst::Create(ptr_rt, ptr_value1_indices, "value1", label_entry);
StoreInst* void_9 = new StoreInst(const_int32_7, ptr_value1, false, label_entry);
void_9->setAlignment(4);
std::vector<Value*> ptr_value2_indices;
ptr_value2_indices.push_back(const_int32_6);
ptr_value2_indices.push_back(const_int32_5);
Instruction* ptr_value2 = GetElementPtrInst::Create(ptr_rt, ptr_value2_indices, "value2", label_entry);
StoreInst* void_10 = new StoreInst(const_int32_8, ptr_value2, false, label_entry);
void_10->setAlignment(4);
So, can I translate from C++ to LLVM IR, if I don't know the indexes of the fields (const_int32_5 and const_int32_6 in code above) ?
UPD================================
So, we can't access to field names. And if we need it (and we do, if we parse .cpp),
we can write something like this:
// It can be some kind of singletone
static std::map<std::string, std::vector<std::string>> mymap;
// Some function, where we first time meet RecordTest
std::vector<std::string> fieldNames;
fieldNames.push_back("value1");
fieldNames.push_back("value2");
mymap["RecordTest"] = fieldNames;
// Some function, where we need to access to RecordTest field
std::vector<std::string> fieldNamesAgain = mymap.find("RecordTest")->second;
std::string fieldName = "value1";
int idxValue1 = -1;
for (int i = 0, e = fieldNamesAgain.size(); i < e; i++) // little ugly search
{
if (fieldName == fieldNamesAgain[i])
{
// we get field index, and now we can build code
// as in example above (llc -march=cpp)
idxValue1 = i;
break;
}
}
Is this right ?
You cannot access the fields of the struct by name, only by index. This information is just normally not there when you compile with Clang.
There is one exception to this, and this is if you compiled with debug information. In that case, you'll have ample data about the type; specifically, you'll get the order of the fields, along with a metadata entry for each field which contains its name (and other useful stuff, such as its offset from the beginning of the type).
Read more about this on the Source Level Debugging guide - and particularly, see this section about struct encoding, with its very nice example.
Take a look at DebugInfo.h for classes to help on querying debug info, though I think you're going to have to do some manually digging anyway.