llvm: How to get the label of Basic Blocks - llvm

I have written a pass to detect and print the label of basicblocks in a function, for I want to use splitBasicBlock() further. I wrote that like this:
virtual bool runOnModule(Module &M)
{
for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F)
{
errs()<<"Function:"<<F->getName()<<"\n";
//for(Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
for (iplist<BasicBlock>::iterator iter = F->getBasicBlockList().begin();
iter != F->getBasicBlockList().end();
iter++)
{
BasicBlock* currBB = iter;
errs() << "BasicBlock: " << currBB->getName() << "\n";
}
}
return true;
}
IR file looks like this:
; <label>:63 ; preds = %43
%64 = load i32* %j, align 4
%65 = sext i32 %64 to i64
%66 = load i8** %tempdst, align 8
%67 = getelementptr inbounds i8* %66, i64 %65
store i8 -1, i8* %67, align 1
br label %73
; <label>:68 ; preds = %43
%69 = load i32* %j, align 4
%70 = sext i32 %69 to i64
%71 = load i8** %tempdst, align 8
%72 = getelementptr inbounds i8* %71, i64 %70
store i8 0, i8* %72, align 1
br label %73
; <label>:73 ; preds = %68, %63
br label %74
However, I got nothing about the label:
Function:main
BasicBlock:
BasicBlock:
BasicBlock:
What's wrong with these "unnamed" basic block? What should I do?

While BasicBlocks may be with no name (as indicated by hasName() method) one may print unique BasicBlock identifier by using currBB->printAsOperand(errs(), false) instead of streaming into errs() the value of currBB->getName(). For unnamed BasicBlock this would provide the numerical basic block representation, such as %68 .

Values in LLVM IR are not required to have a name; and indeed, those basic blocks don't have names, which is why you get an empty string from currBB->getName().
The reason that they have names in the LLVM IR printout is because when you print to the textual format of LLVM IR (as it appears in .ll files), you have to assign a name to them to make them referable, so the printer assigns sequential numeric names to basic blocks (and other values). Those numeric names are only created by the printer, though, and don't actually exist in the module.

While compiling source code to bitcode using clang use the below flag
-fno-discard-value-names
You will get the name of basic block as a unique string

I think the behavior of LLVM now is different.
I use similar lines of code and can get the label's name on LLVM-4.0
for (auto &funct : m) {
for (auto &basic_block : funct) {
StringRef bbName(basic_block.getName());
errs() << "BasicBlock: " << bbName << "\n";
}
}

As ElazarR said, currBB->printAsOperand(errs(), false) will print such ID in the error stream, but it is possible to store it in a string as well if this is more interesting to your logic.
In the LLVM CFG generation pass -dot-cfg, they always name the basic block using the BB's name (if any) or its representation as a string. This logic is present in the CFGPrinter.h header (http://llvm.org/doxygen/CFGPrinter_8h_source.html#l00063):
static std::string getSimpleNodeLabel(const BasicBlock *Node,
const Function *) {
if (!Node->getName().empty())
return Node->getName().str();
std::string Str;
raw_string_ostream OS(Str);
Node->printAsOperand(OS, false);
return OS.str();
}
You can use this logic to always return a valid name for the basic block.

Related

LLVM retrieve name of AllocaInst

I am trying to retrieve the name of the pointer passed to a cudaMalloc call.
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
errs() << AI->getName() << "\n";
}
The above however just prints an empty line. Is is possible to get the pointer name out of this alloca?
This is the relevant IR:
%28 = alloca i8*, align 8
...
...
call void #llvm.dbg.declare(metadata i8** %28, metadata !926, metadata !DIExpression()), !dbg !927
%257 = call i32 #cudaMalloc(i8** %28, i64 1), !dbg !928
...
...
!926 = !DILocalVariable(name: "d_over", scope: !677, file: !3, line: 191, type: !22)
!927 = !DILocation(line: 191, column: 10, scope: !677)
Answering my own question. It turns out that there is an llvm.dbg.declare call (DbgDeclareInst) corresponding to the alloca but it may appear anywhere in the caller function's basic blocks. Probably it comes after the first use of this Alloca value? Not sure. In any case, my solution is to search for DbgDeclareInst instructions, check if it is for an AllocaInst and if so compare that alloca with the alloca of interest and if equal get the variable name. Something like this:
CallInst *CUMallocCI = ... ; // CI of cudaMalloc call
Value *Ptr = CUMallocCI->getOperand(0);
if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) {
if ( !AI->hasName() ) {
// Function this AllocaInst belongs
Function *Caller = AI->getParent()->getParent();
// Search for llvm.dbg.declare
for ( BasicBlock& BB : *Caller)
for (Instruction &I : BB) {
if ( DbgDeclareInst *dbg = dyn_cast<DbgDeclareInst>(&I))
// found. is it for an AllocaInst?
if ( AllocaInst *dbgAI = dyn_cast<AllocaInst>(dbg->getAddress()))
// is it for our AllocaInst?
if (dbgAI == AI)
if (DILocalVariable *varMD = dbg->getVariable()) // probably not needed?
errs() << varMD->getName() << "\n";
} else {
errs() << AI->getName() << "\n";
}
}

Getting the use and def of an llvm instruction

I am attempting to carry out liveness analysis and in order to do so I need to obtain the def and use sets for all of my nodes, where they are defined as follows:
def[n] = set of all variables defined at node n
use[n] = set of all variables used at node n
So for example in the line:
a = b + c
def[n] = {a}
use[n] = {b,c}
How can I do this?
http://llvm.org/docs/ProgrammersManual.html#iterating-over-def-use-use-def-chains
I hope this page can help you. User objects that access the user_begin (), user_end (), and users () methods in the Value object are those that use the Value object. The Instruction class is a subclass of the User class.
I may be wrong, but to get the def-use set in the IR level, the elements of the set should be Value objects, and each node should be an Instruction object.
Therefore, the def set for each node will be the Value object returned by the instruction(There is a possible case that the instruction doesn't return. And that case, def set is empty set.), and the use set will be the User object accessed through the user_iterator of the instruction.
Basically, when we define a local variable in LLVM it uses AllocaInst. Here's the example that you have put up:
a=b+c
in C code:
int a;
int b=10;
int c=10;
a=b+c;
In LLVM IR compiled with -g (debug mode):
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
call void #llvm.dbg.declare(metadata i32* %a, metadata !14, metadata !16),
... !dbg !17
call void #llvm.dbg.declare(metadata i32* %b, metadata !18, metadata !16),
... !dbg !19
store i32 10, i32* %b, align 4, !dbg !19
call void #llvm.dbg.declare(metadata i32* %c, metadata !20, metadata !16),
... !dbg !21
store i32 10, i32* %c, align 4, !dbg !21
%0 = load i32, i32* %b, align 4, !dbg !22
%1 = load i32, i32* %c, align 4, !dbg !23
%add = add nsw i32 %0, %1, !dbg !24
store i32 %add, i32* %a, align 4, !dbg !25
Let's see how to use the LLVM API to collect def-use chains. It is easy as LLVM has an in-house built-in API at function level for this one:
bool runOnFunction(Function &F){
errs() << "digraph " + F.getName() + "{\n";
errs() << "\n";
for (auto block = F.getBasicBlockList().begin(); block != F.getBasicBlockList().end(); block++) {
for (auto inst = block->begin(); inst != block->end(); inst++) {
for (Use &U:inst->operands()) {
Value *v = U.get();
if (dyn_cast<Instruction>(v)) {
errs() << "\"" << *dyn_cast<Instruction>(v) << "\"" << " -> " << "\"" << *inst << "\"" << ";\n";
}
if (v->getName() != "") {
errs() << "\"" << v->getName() << "\"" << " -> " << "\"" << *inst << "\"" << ";\n";
errs() << "\"" << v->getName() << "\"" << " [ color = red ]\n";
}
}
}
}
errs() << "\n}\n";
}
For def-use chains, simple tract alloca and user.
This will generate a PDG.
The code taken from https://github.com/DengMinghua/LLVM-Program-Dependency-Graph-Generator
Hope this helps.

Acquiring the name(or the number) of the debug info in LLVM-IR

I want to print the exact number of the debug information in IR, how could I do it?
For example, consider an IR chunk as below,
call void #llvm.dbg.declare(metadata i32* %a, metadata !10, metadata !11), !dbg !12!
!12 = !DILocation(line: 19, column: 7, scope: !6)
I want to print the !12 as a string for debugging purpose. I can acquire the object of DILocation by doing
Instruction::getDebugLoc()->get()
but all I get is a pointer and there is no such interface for acquiring the number. I can assume that LLVM gives the number when it is actually generating the bitcode, since dumping the DILocation gives a result something like
<0x7342628> = !DILocation(line: 23, column: 3, scope: <0x733e5f8>)
this. But when I use Instruction::dump(), it gives me something that looks like
call void #llvm.dbg.declare(metadata i32* %a, metadata !10, metadata !11), !dbg !12
this, So I am confused whether it has the numbering information of a debug-info or not during runtime.
Does it have the numbering information or not? If so, how can I acquire that info? If not, where should I inspect to look for the generation of the bitcode in LLVM?
Are you talking about line/column numbers? If so, then you can easily access them directly from the debugLoc:
instruction->getDebugLoc()->getLine()
instruction->getDebugLoc()->getColumn()
See the definition at DebugInfoMetadata:
unsigned getLine() const { return SubclassData32; }
unsigned getColumn() const { return SubclassData16; }
It is probably not too late to answer this question.
if (instruction->hasMetadata()) {
instruction->dump();
// one way
SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
instruction->getAllMetadata(MDs);
for (auto &MD : MDs) {
if (MDNode *N = MD.second) {
N->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
}
}
// second way
instruction->getDebugLoc()->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
// third way
int debugInfoKindID = 0;
MDNode *debug = instruction->getMetadata(debugInfoKindID);
debug->printAsOperand(errs(), instruction->getModule());
errs() << "\n";
}
Output is:
%11 = add nsw i32 %9, %10, !dbg !29
!29
!29
!29
I found this by looking at llvm//unittests/IR/MetadataTest.cpp, its test TEST_F(MDNodeTest, PrintFromMetadataAsValue).

LLVM IR - Can someone explain this behavior?

I'm trying to build a compiler for my language at the moment. In my language, I want to have implicit pointer usage for objects/structs just like in Java. In the program below, I am testing out this feature. However, the program does not run as I had expected. I do not expect you guys to read through my entire compiler code because that would be a waste of time. Instead I was hoping I could explain what I intended for the program to do and you guys could spot in the llvm ir what went wrong. That way, I can adjust the compiler to generate proper llvm ir.
Flow:
[Function] Main - [Return: Int] {
-> Allocates space for structure of one i32
-> Calls createObj function and stores the returning value inside previous allocated space
-> Returns the i32 of the structure
}
[Function] createObj - [Return: struct { i32 }] {
-> Allocates space for structure of one i32
-> Calls Object function on this space (pointer really)
-> Returns this space (pointer really)
}
[Function] Object - [Return: void] {
-> Stores the i32 value of 5 inside of the struct pointer argument
}
The program is that main keeps returning some random number instead of 5. One such number is 159383856. I'm guessing that this is the decimal representation of a pointer address, but I'm not sure why it is printing out the pointer address.
; ModuleID = 'main'
%Object = type { i32 }
define i32 #main() {
entry:
%0 = call %Object* #createObj()
%o = alloca %Object*
store %Object* %0, %Object** %o
%1 = load %Object** %o
%2 = getelementptr inbounds %Object* %1, i32 0, i32 0
%3 = load i32* %2
ret i32 %3
}
define %Object* #createObj() {
entry:
%0 = alloca %Object
call void #-Object(%Object* %0)
%o = alloca %Object*
store %Object* %0, %Object** %o
%1 = load %Object** %o
ret %Object* %1
}
define void #-Object(%Object* %this) {
entry:
%0 = getelementptr inbounds %Object* %this, i32 0, i32 0
store i32 5, i32* %0
ret void
}
This llvm ir is generated from this syntax.
func () > main > (int) {
Object o = createObj();
return o.id;
}
// Create an object and returns it
func () > createObj > (Object) {
Object o = make Object < ();
return o;
}
// Object decl
tmpl Object {
int id; // Property
// This is run every time an object is created.
constructor < () {
this.id = 5;
}
}
It seems like in createObj you're returning a pointer to a stack variable which will no longer be valid after function return.
If you're doing implicit object pointers like Java at minimum you're going to need a call to a heap allocation like malloc which I don't think you have.

LLVM extract i8* out of structure value

I'm writing a compiler using LLVM as a backend, I've written the front-end (parser, etc.) and now I've come to a crossroads.
I have a structure (%Primitive) which contains a single field, an i8* value, a pointer to a character array.
%Primitive = type { i8* }
In the compiler, instances of Primitive are passed around on the stack. I'm trying to write this character array to standard output using the puts function, but it isn't working quite like I was hoping.
declare i32 #puts(i8*) ; Declare the libc function 'puts'
define void #WritePrimitive(%Primitive) {
entry:
%1 = extractvalue %Primitive %0, 0 ; Extract the character array from the primitive.
%2 = call i32 #puts(i8* %1) ; Write it
ret void
}
When I try to run the code (either using an ExecutionEngine or the LLVM interpreter program lli), I get the same error; a segmentation fault.
The error lies in the fact that the address passed to puts is somehow the ASCII character code of the first character in the array. It seems the address passed, rather than being a pointer to an array of 8 bit chars, is instead an 8 bit wide pointer that equals the dereferenced string.
For example, if I call #WritePrimitive with a primitive where the i8* member points to the string "hello", puts is called with the string address being 0x68.
Any ideas?
Thanks
EDIT: You were right, I was initializing my Primitive incorrectly, my new initialization function is:
llvm::Value* PrimitiveHelper::getConstantPrimitive(const std::string& str, llvm::BasicBlock* bb)
{
ConstantInt* int0 = ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 0);
Constant* strConstant = ConstantDataArray::getString(getGlobalContext(), str, true);
GlobalVariable* global = new GlobalVariable(module,
strConstant->getType(),
true, // Constant
GlobalValue::ExternalLinkage,
strConstant,
"str");
Value* allocated = new AllocaInst(m_primitiveType, "allocated", bb);
LoadInst* onStack1 = new LoadInst(allocated, "onStack1", bb);
GetElementPtrInst* ptr = GetElementPtrInst::Create(global, std::vector<Value*>(2,int0), "", bb);
InsertValueInst* onStack2 = InsertValueInst::Create(onStack1, ptr, std::vector<unsigned>(1, 0), "", bb);
return onStack2;
}
I missed that, Thank You!
There's nothing wrong with the code you pasted above; I just tried it myself and it worked fine. I'm guessing the issue is that you did not initialize the pointer properly, or did not set it properly into the struct.
The full code I used is:
#str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
; Your code
%Primitive = type { i8* }
declare i32 #puts(i8*) ; Declare the libc function 'puts'
define void #WritePrimitive(%Primitive) {
entry:
%1 = extractvalue %Primitive %0, 0 ; Extract the character array from the primitive.
%2 = call i32 #puts(i8* %1) ; Write it
ret void
}
; /Your code
define void #main() {
%allocated = alloca %Primitive
%onstack1 = load %Primitive* %allocated
%onstack2 = insertvalue %Primitive %onstack1, i8* getelementptr ([13 x i8]* #str, i64 0, i64 0), 0
call void #WritePrimitive(%Primitive %onstack2)
ret void
}