How to check if the llvm callInst contains a bitcast? - c++

My llvm ir looks something like this :
call void bitcast (void (%struct.type1*, %opencl.image2d_t addrspace(1)*, i32, %struct.type1*)* #_Z36functype1 to void (%struct.type2*, %opencl.image2d_t addrspace(1)*, i32, %struct.type1*)*)(%struct.type2* sret %19, %opencl.image2d_t addrspace(1)* %237, i32 %238, %struct.type1* byval %sic_payload)
I want to check if the call is an actual function call or the one with the bitcast. Does anyone know how to do this ?
I tried :
const CallInst *pInstCall = dyn_cast<CallInst>(&*it);
if (!pInstCall) continue;
dyn_cast<BitCastInst >(pInstCall->getCalledFunction());
But that doesn't seem to work.

You're looking for
if (auto *CstExpr = dyn_cast<ConstantExpr>(it->getOperand(0))) {
// BitCastInst is an *Instrution*, here you have a *ConstantExpr* Bitcast
if (CstExpr.isCast()) {
// do something...
}
}

Related

how to implement virtual table by using llvm

I'm writing a toy compiler and want my language support virtual methods, but I have no idea how to do it, it seems not as straight forward as other statements which can be easily turn into the IR code without a second thought, the v-table concept in my mind exists as some graphs and lines just like some high level illustrate. This may enough for using a OOP language but seems not enough for writing one.
I tried to write some C++ code and turn it into ir code but sadly I cannot understand the output still. I checked the source code of Clang and couldn't even figure out where this part sits...(well, I got the code, it seems located at lib/CodeGen/CGClass.cpp, but Clang is a complicated project and I, still, cannot understand how it implement the v-table)
So any idea how to do this, or is there some llvm api to help me implement this?
A vtable is an array of function pointers. In a single-inheritance context, you'd have one such array per class where the elements of the array are the class's virtual methods. Each object would then contain a pointer to its class's vtable and each virtual method call would simply invoke the corresponding pointer in the vtable (after casting it to the needed type).
So let's say you're compiling a program that looks like this:
class A {
int x,y;
virtual int foo() { return x+y; }
virtual int bar() { return x*y; }
}
class B inherits A {
int z;
override int bar() { return x*y+z; }
}
int f(A a) {
return a.foo() + a.bar();
}
Then you could define functions named A_foo, A_bar and B_bar taking an A or B pointer and containing the code for A.foo, A.bar and B.bar respectively (the exact naming would depend on your name mangling scheme of course). Then you'd generate two globals A_vtable and B_vtable that'd look like this:
#A_vtable = global [2 x void (...)*] [
void (...)* bitcast (i32 (%struct.A*)* #A_foo to void (...)*),
void (...)* bitcast (i32 (%struct.A*)* #A_bar to void (...)*)
]
#B_vtable = global [2 x void (...)*] [
void (...)* bitcast (i32 (%struct.A*)* #A_foo to void (...)*),
void (...)* bitcast (i32 (%struct.B*)* #B_bar to void (...)*)
]
Which would correspond to this C code (which is hopefully more readable):
typedef void (*fpointer_t)();
fpointer_t A_vtable[] = {(fpointer_t) A_foo, (fpointer_t) A_bar};
fpointer_t B_vtable[] = {(fpointer_t) A_foo, (fpointer_t) B_bar};
f could then be translated like this:
define i32 #f(%struct.A*) {
%2 = getelementptr inbounds %struct.A, %struct.A* %0, i64 0, i32 0
%3 = bitcast %struct.A* %0 to i32 (%struct.A*)***
%4 = load i32 (%struct.A*)**, i32 (%struct.A*)*** %3
%5 = load i32 (%struct.A*)*, i32 (%struct.A*)** %4
%6 = call i32 %5(%struct.A* %0)
%7 = load void (...)**, void (...)*** %2
%8 = getelementptr inbounds void (...)*, void (...)** %7, i64 1
%9 = bitcast void (...)** %8 to i32 (%struct.A*)**
%10 = load i32 (%struct.A*)*, i32 (%struct.A*)** %9
%11 = call i32 %10(%struct.A* %0)
%12 = add nsw i32 %11, %6
ret i32 %12
}
Or in C:
typedef int (*A_int_method_t)(struct A*);
int f(struct A* a) {
return ((A_int_method_t) a->vtable[0])(a) + ((A_int_method_t) a->vtable[1])(a);
}

LLVM IR - Can someone explain this behavior?

I'm trying to build a compiler for my language at the moment. In my language, I want to have implicit pointer usage for objects/structs just like in Java. In the program below, I am testing out this feature. However, the program does not run as I had expected. I do not expect you guys to read through my entire compiler code because that would be a waste of time. Instead I was hoping I could explain what I intended for the program to do and you guys could spot in the llvm ir what went wrong. That way, I can adjust the compiler to generate proper llvm ir.
Flow:
[Function] Main - [Return: Int] {
-> Allocates space for structure of one i32
-> Calls createObj function and stores the returning value inside previous allocated space
-> Returns the i32 of the structure
}
[Function] createObj - [Return: struct { i32 }] {
-> Allocates space for structure of one i32
-> Calls Object function on this space (pointer really)
-> Returns this space (pointer really)
}
[Function] Object - [Return: void] {
-> Stores the i32 value of 5 inside of the struct pointer argument
}
The program is that main keeps returning some random number instead of 5. One such number is 159383856. I'm guessing that this is the decimal representation of a pointer address, but I'm not sure why it is printing out the pointer address.
; ModuleID = 'main'
%Object = type { i32 }
define i32 #main() {
entry:
%0 = call %Object* #createObj()
%o = alloca %Object*
store %Object* %0, %Object** %o
%1 = load %Object** %o
%2 = getelementptr inbounds %Object* %1, i32 0, i32 0
%3 = load i32* %2
ret i32 %3
}
define %Object* #createObj() {
entry:
%0 = alloca %Object
call void #-Object(%Object* %0)
%o = alloca %Object*
store %Object* %0, %Object** %o
%1 = load %Object** %o
ret %Object* %1
}
define void #-Object(%Object* %this) {
entry:
%0 = getelementptr inbounds %Object* %this, i32 0, i32 0
store i32 5, i32* %0
ret void
}
This llvm ir is generated from this syntax.
func () > main > (int) {
Object o = createObj();
return o.id;
}
// Create an object and returns it
func () > createObj > (Object) {
Object o = make Object < ();
return o;
}
// Object decl
tmpl Object {
int id; // Property
// This is run every time an object is created.
constructor < () {
this.id = 5;
}
}
It seems like in createObj you're returning a pointer to a stack variable which will no longer be valid after function return.
If you're doing implicit object pointers like Java at minimum you're going to need a call to a heap allocation like malloc which I don't think you have.

llvm - How to implement print function in my language?

I'm following llvm's tutorial for their own simple programming language "Kaleidoscope" and there's an obvious functionality in my language which this tutorial doesn't seem to cover. I simply want to print any double to standard output pretty much as C++ would do:
std::cout << 5.0;
my language would do something like
print(5.0);
Third chapter of llvm's tutorial covers function calls. The code they use is:
Value *CallExprAST::codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (!CalleeF)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->codegen());
if (!ArgsV.back())
return nullptr;
}
return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
}
How could I implement codegen() method for specific function call print(any fp number)?
below is the llvm ir code generated for printf("%f", a); using clang. printf signature is int printf(const char*, ...);
#.str = private unnamed_addr constant [3 x i8] c"%f\00", align 1
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
%a = alloca double, align 8
%1 = load double* %a, align 8
%2 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([3 x i8]* #.str, i32 0, i32 0), double %1)
ret i32 0
}
declare i32 #printf(i8*, ...) #1
to implement in codegen you first need to check if the function is already present in module or not. if not then you need to add the declaration, you can do both in one call.
Function *CalleeF = TheModule->getOrInsertFunction("printf",
FunctionType::get(IntegerType::getInt32Ty(Context), PointerType::get(Type::getInt8Ty(Context), 0), true /* this is var arg func type*/)
);
above will get or add you the handle to function declaration
declare i32 #printf(i8*, ...) #1
then you can call function via matching params.
std::vector<Value *> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i)
ArgsV.push_back(Args[i]->codegen());
return Builder.CreateCall(CalleeF, ArgsV, "printfCall");
You'd first check if Callee == "print" and then insert any instructions you want.
LLVM IR has no concept of "printing" since that's not really a language consideration -- it's a facility provided by the OS. Probably the simplest option for you would be to translate the call into a call to printf, so that e.g. print(5.0) becomes printf("%f\n", 5.0).
The tutorial you linked does show how external function calls work -- you'd have to insert a declaration for printf with the correct signature, then build a call to that.

llvm: How to get the label of Basic Blocks

I have written a pass to detect and print the label of basicblocks in a function, for I want to use splitBasicBlock() further. I wrote that like this:
virtual bool runOnModule(Module &M)
{
for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F)
{
errs()<<"Function:"<<F->getName()<<"\n";
//for(Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
for (iplist<BasicBlock>::iterator iter = F->getBasicBlockList().begin();
iter != F->getBasicBlockList().end();
iter++)
{
BasicBlock* currBB = iter;
errs() << "BasicBlock: " << currBB->getName() << "\n";
}
}
return true;
}
IR file looks like this:
; <label>:63 ; preds = %43
%64 = load i32* %j, align 4
%65 = sext i32 %64 to i64
%66 = load i8** %tempdst, align 8
%67 = getelementptr inbounds i8* %66, i64 %65
store i8 -1, i8* %67, align 1
br label %73
; <label>:68 ; preds = %43
%69 = load i32* %j, align 4
%70 = sext i32 %69 to i64
%71 = load i8** %tempdst, align 8
%72 = getelementptr inbounds i8* %71, i64 %70
store i8 0, i8* %72, align 1
br label %73
; <label>:73 ; preds = %68, %63
br label %74
However, I got nothing about the label:
Function:main
BasicBlock:
BasicBlock:
BasicBlock:
What's wrong with these "unnamed" basic block? What should I do?
While BasicBlocks may be with no name (as indicated by hasName() method) one may print unique BasicBlock identifier by using currBB->printAsOperand(errs(), false) instead of streaming into errs() the value of currBB->getName(). For unnamed BasicBlock this would provide the numerical basic block representation, such as %68 .
Values in LLVM IR are not required to have a name; and indeed, those basic blocks don't have names, which is why you get an empty string from currBB->getName().
The reason that they have names in the LLVM IR printout is because when you print to the textual format of LLVM IR (as it appears in .ll files), you have to assign a name to them to make them referable, so the printer assigns sequential numeric names to basic blocks (and other values). Those numeric names are only created by the printer, though, and don't actually exist in the module.
While compiling source code to bitcode using clang use the below flag
-fno-discard-value-names
You will get the name of basic block as a unique string
I think the behavior of LLVM now is different.
I use similar lines of code and can get the label's name on LLVM-4.0
for (auto &funct : m) {
for (auto &basic_block : funct) {
StringRef bbName(basic_block.getName());
errs() << "BasicBlock: " << bbName << "\n";
}
}
As ElazarR said, currBB->printAsOperand(errs(), false) will print such ID in the error stream, but it is possible to store it in a string as well if this is more interesting to your logic.
In the LLVM CFG generation pass -dot-cfg, they always name the basic block using the BB's name (if any) or its representation as a string. This logic is present in the CFGPrinter.h header (http://llvm.org/doxygen/CFGPrinter_8h_source.html#l00063):
static std::string getSimpleNodeLabel(const BasicBlock *Node,
const Function *) {
if (!Node->getName().empty())
return Node->getName().str();
std::string Str;
raw_string_ostream OS(Str);
Node->printAsOperand(OS, false);
return OS.str();
}
You can use this logic to always return a valid name for the basic block.

LLVM extract i8* out of structure value

I'm writing a compiler using LLVM as a backend, I've written the front-end (parser, etc.) and now I've come to a crossroads.
I have a structure (%Primitive) which contains a single field, an i8* value, a pointer to a character array.
%Primitive = type { i8* }
In the compiler, instances of Primitive are passed around on the stack. I'm trying to write this character array to standard output using the puts function, but it isn't working quite like I was hoping.
declare i32 #puts(i8*) ; Declare the libc function 'puts'
define void #WritePrimitive(%Primitive) {
entry:
%1 = extractvalue %Primitive %0, 0 ; Extract the character array from the primitive.
%2 = call i32 #puts(i8* %1) ; Write it
ret void
}
When I try to run the code (either using an ExecutionEngine or the LLVM interpreter program lli), I get the same error; a segmentation fault.
The error lies in the fact that the address passed to puts is somehow the ASCII character code of the first character in the array. It seems the address passed, rather than being a pointer to an array of 8 bit chars, is instead an 8 bit wide pointer that equals the dereferenced string.
For example, if I call #WritePrimitive with a primitive where the i8* member points to the string "hello", puts is called with the string address being 0x68.
Any ideas?
Thanks
EDIT: You were right, I was initializing my Primitive incorrectly, my new initialization function is:
llvm::Value* PrimitiveHelper::getConstantPrimitive(const std::string& str, llvm::BasicBlock* bb)
{
ConstantInt* int0 = ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 0);
Constant* strConstant = ConstantDataArray::getString(getGlobalContext(), str, true);
GlobalVariable* global = new GlobalVariable(module,
strConstant->getType(),
true, // Constant
GlobalValue::ExternalLinkage,
strConstant,
"str");
Value* allocated = new AllocaInst(m_primitiveType, "allocated", bb);
LoadInst* onStack1 = new LoadInst(allocated, "onStack1", bb);
GetElementPtrInst* ptr = GetElementPtrInst::Create(global, std::vector<Value*>(2,int0), "", bb);
InsertValueInst* onStack2 = InsertValueInst::Create(onStack1, ptr, std::vector<unsigned>(1, 0), "", bb);
return onStack2;
}
I missed that, Thank You!
There's nothing wrong with the code you pasted above; I just tried it myself and it worked fine. I'm guessing the issue is that you did not initialize the pointer properly, or did not set it properly into the struct.
The full code I used is:
#str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
; Your code
%Primitive = type { i8* }
declare i32 #puts(i8*) ; Declare the libc function 'puts'
define void #WritePrimitive(%Primitive) {
entry:
%1 = extractvalue %Primitive %0, 0 ; Extract the character array from the primitive.
%2 = call i32 #puts(i8* %1) ; Write it
ret void
}
; /Your code
define void #main() {
%allocated = alloca %Primitive
%onstack1 = load %Primitive* %allocated
%onstack2 = insertvalue %Primitive %onstack1, i8* getelementptr ([13 x i8]* #str, i64 0, i64 0), 0
call void #WritePrimitive(%Primitive %onstack2)
ret void
}