I have this byte code fragment:
define void #setGlobal(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32* %a.addr, align 4
store i32 %0, i32* #Global, align 4
%1 = load i32* %a.addr, align 4
store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4
store i32 2, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 2), align 4
ret void
}
I am using this code to find the getelementptr from "store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4":
for (Module::iterator F = p_Module.begin(), endF = p_Module.end(); F != endF; ++F) {
for (Function::iterator BB = F->begin(), endBB = F->end(); BB != endBB; ++BB) {
for (BasicBlock::iterator I = BB->begin(), endI = BB->end(); I
!= endI; ++I) {
if (StoreInst* SI = dyn_cast<StoreInst>(I)) {
if (Instruction *gep = dyn_cast<Instruction>(SI->getOperand(1)))
{
if (gep->getOpcode() == Instruction::GetElementPtr)
{
//do something
}
}
}
}
}
}
This code can't find the getelementptr. What am I doing wrong?
There are no getelementptr instructions in your bitcode snippet, which is why you can't find them.
The two cases that look like a getelementptr instructions are actually constant expressions - the telltale sign is that they appear as part of another instruction (store), which is not something you can do with regular instructions.
So if you want to search for that expression, you need to look for type GetElementPtrConstantExpr, not GetElementPtrInst.
Related
I was reading Clang++ produced LLVM IR code of following code:
class Shape {
public:
// pure virtual function providing interface framework.
virtual int getArea(char* me) = 0;
void setWidth(int w) {
width = w;
}
void setHeight(int h) {
height = h;
}
protected:
int width;
int height;
};
// Derived classes
class Rectangle: public Shape {
public:
int getArea(char * me) {
return (width * height);
}
};
which produces following LLVM IR :
%class.Rectangle = type { %class.Shape }
%class.Shape = type { i32 (...)**, i32, i32 }
What is this " i32 (...)** " ? What does it do ?
From the looks of "i32 (...)**", this looks like function pointer but used to bitcast objects.
like so :
define linkonce_odr dso_local void #_ZN9RectangleC2Ev(%class.Rectangle* %0) unnamed_addr #5 comdat align 2 {
%2 = alloca %class.Rectangle*, align 8
store %class.Rectangle* %0, %class.Rectangle** %2, align 8
%3 = load %class.Rectangle*, %class.Rectangle** %2, align 8
%4 = bitcast %class.Rectangle* %3 to %class.Shape*
call void #_ZN5ShapeC2Ev(%class.Shape* %4) #3
%5 = bitcast %class.Rectangle* %3 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* #_ZTV9Rectangle, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 (...)*** %5, align 8
ret void
}
Let's watch simpler code
struct A {
virtual void f1();
int width;
};
struct B: public A {
void f1() {};
};
B a;
After compilation it is possible to get this:
%struct.B = type { %struct.A.base, [4 x i8] }
%struct.A.base = type <{ i32 (...)**, i32 }>
#a = dso_local local_unnamed_addr global %struct.B { %struct.A.base <{ i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* #_ZTV1B, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 0 }>, [4 x i8] zeroinitializer }, align 8
#_ZTV1B = linkonce_odr dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* #_ZTI1B to i8*), i8* bitcast (void (%struct.B*)* #_ZN1B2f1Ev to i8*)] }, comdat, align 8
As it possible to see, i32 (...)** is for _ZTV1B, which will become vtable for B after demangling.
As we see that mystery function is:
getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* #_ZTV1B, i32 0, inrange i32 0, i32 2)
Which is _ZN1B2f1Ev after GEP, which is B::f1() after demangling.
I also tried this example:
auto f(B *a) {
a->f1();
}
Generated code is:
define dso_local void #_Z1fP1B(%struct.B* %0) local_unnamed_addr #0 {
%2 = bitcast %struct.B* %0 to void (%struct.B*)***
%3 = load void (%struct.B*)**, void (%struct.B*)*** %2, align 8, !tbaa !3
%4 = load void (%struct.B*)*, void (%struct.B*)** %3, align 8
tail call void %4(%struct.B* nonnull align 8 dereferenceable(12) %0)
ret void
}
As it possible to see, it simply takes needed function and calls it.
P.S.
We currently use i32 (...)** as the type of the vptr field in the LLVM
struct type. LLVM's GlobalOpt prefers any bitcasts to be on the side
of the data being stored rather than on the pointer being stored to.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I tried to compare the overhead of std::visit(std::variant polymorphism) and virtual function(std::unique_ptr polymorphism).(please note my question is not about overhead or performance, but optimization.)
Here is my code.
https://quick-bench.com/q/pJWzmPlLdpjS5BvrtMb5hUWaPf0
#include <memory>
#include <variant>
struct Base
{
virtual void Process() = 0;
};
struct Derived : public Base
{
void Process() { ++a; }
int a = 0;
};
struct VarDerived
{
void Process() { ++a; }
int a = 0;
};
static std::unique_ptr<Base> ptr;
static std::variant<VarDerived> var;
static void PointerPolyMorphism(benchmark::State& state)
{
ptr = std::make_unique<Derived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
ptr->Process();
}
}
BENCHMARK(PointerPolyMorphism);
static void VariantPolyMorphism(benchmark::State& state)
{
var.emplace<VarDerived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
std::visit([](auto&& x) { x.Process();}, var);
}
}
BENCHMARK(VariantPolyMorphism);
I know it's not good benchmark test, it was only draft during my test.
But I was surprised at the result.
std::visit benchmark was high(which means slow) without any optimization.
But When I turn on optimization (higher than O2), std::visit benchmark is extremely low(which means extremely fast) while std::unique_ptr isn't.
I'm wondering why the same optimization can't be applied to the std::unique_ptr polymorphism?
I've compiled your code with Clang++ to LLVM (without your benchmarking) with -Ofast. Here's what you get for VariantPolyMorphism, unsurprisingly:
define void #_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
ret void
}
On the other hand, PointerPolyMorphism does really execute the loop and all calls:
define void #_Z19PointerPolyMorphismv() local_unnamed_addr #2 personality i32 (...)* #__gxx_personality_v0 {
%1 = tail call dereferenceable(16) i8* #_Znwm(i64 16) #8, !noalias !8
tail call void #llvm.memset.p0i8.i64(i8* nonnull align 16 dereferenceable(16) %1, i8 0, i64 16, i1 false), !noalias !8
%2 = bitcast i8* %1 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* #_ZTV7Derived, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8, !tbaa !11, !noalias !8
%3 = getelementptr inbounds i8, i8* %1, i64 8
%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 8, !tbaa !13, !noalias !8
%5 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* #_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
store i8* %1, i8** bitcast ({ { %struct.Base* } }* #_ZL3ptr to i8**), align 8, !tbaa !4
%6 = icmp eq %struct.Base* %5, null
br i1 %6, label %7, label %8
7: ; preds = %8, %0
br label %11
8: ; preds = %0
%9 = bitcast %struct.Base* %5 to i8*
tail call void #_ZdlPv(i8* %9) #7
br label %7
10: ; preds = %11
ret void
11: ; preds = %7, %11
%12 = phi i32 [ %17, %11 ], [ 0, %7 ]
%13 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* #_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
%14 = bitcast %struct.Base* %13 to void (%struct.Base*)***
%15 = load void (%struct.Base*)**, void (%struct.Base*)*** %14, align 8, !tbaa !11
%16 = load void (%struct.Base*)*, void (%struct.Base*)** %15, align 8
tail call void %16(%struct.Base* %13)
%17 = add nuw nsw i32 %12, 1
%18 = icmp eq i32 %17, 1000000
br i1 %18, label %10, label %11
}
The reason for this is that both your variables are static. This allows the compiler to infer that no code outside the translation unit has access to your variant instance. Therefore your loop doesn't have any visible effect and can be safely removed. However, although your smart pointer is static, the memory it points to could still change (as a side-effect of the call to Process, for example). The compiler can therefore not easily prove that is safe to remove the loop and doesn't.
If you remove the static from both VariantPolyMorphism you get:
define void #_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
store i32 0, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* #var, i64 0, i32 0, i32 1), align 4, !tbaa !16
store i32 1000000, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* #var, i64 0, i32 0, i32 0, i32 0, i32 0, i32 0), align 4, !tbaa !18
ret void
}
Which isn't surprising once again. The variant can only contain VarDerived so nothing needs to be computed at run-time: The final state of the variant can already be determined at compile-time. The difference, though, now is that some other translation unit might want to access the value of var later on and the value must therefore be written.
Your variant can store only singe type, so this is same as single regular variable (it is working more like an optional).
You are running test without optimizations enabled
Result is not secured from optimizer so it can trash your code.
Your code actually do not utilizes polymorphism, some compilers are able to figure out that there is only one implementation of Base class and drop virtual calls.
This is better but still not trustworthy:
ver 1, ver 2 with arrays.
Yes polymorphism can be expensive when used in tight loops.
Witting benchmarks for such small extremely fast features is hard and full of pitfalls, so must be approached with extreme caution, since you reaching limitations of benchmark tool.
typedef int (D::*fptr)(void);
fptr bfunc;
bfunc=&D::Bfunc;
cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
complete code available at : https://ideone.com/wRVyTu
I am trying to use reinterpret_cast, but the compiler throws error
prog.cpp: In function 'int main()': prog.cpp:49:51: error: invalid cast from type 'fptr {aka int (D::*)()}' to type 'long long unsigned int' cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
My questions are :
why is reinterpret_cast not suitable for this occasion?
Is there another way, I can see the contents of the method pointer?
Using clang++ to compile a slightly modified version of your code (removed all the cout to not get thousands of lines...), we get this for main:
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
ret i32 0
}
Note that the bfunc and dfunc are two 64-bit integer values. If I compile for 32-bit x86 it is two i32 (so 32-bit integer values).
So, if we make main look like this:
int main() {
// your code goes here
typedef int (D::*fptr)(void);
fptr bfunc;
fptr dfunc;
bfunc=&D::Bfunc;
dfunc=&D::Dfunc;
D d;
(d.*bfunc)();
return 0;
}
the generated code looks like this:
; Function Attrs: norecurse uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
%d = alloca %class.D, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
call void #_ZN1DC2Ev(%class.D* %d) #3
%0 = load { i64, i64 }, { i64, i64 }* %bfunc, align 8
%memptr.adj = extractvalue { i64, i64 } %0, 1
%1 = bitcast %class.D* %d to i8*
%2 = getelementptr inbounds i8, i8* %1, i64 %memptr.adj
%this.adjusted = bitcast i8* %2 to %class.D*
%memptr.ptr = extractvalue { i64, i64 } %0, 0
%3 = and i64 %memptr.ptr, 1
%memptr.isvirtual = icmp ne i64 %3, 0
br i1 %memptr.isvirtual, label %memptr.virtual, label %memptr.nonvirtual
memptr.virtual: ; preds = %entry
%4 = bitcast %class.D* %this.adjusted to i8**
%vtable = load i8*, i8** %4, align 8
%5 = sub i64 %memptr.ptr, 1
%6 = getelementptr i8, i8* %vtable, i64 %5
%7 = bitcast i8* %6 to i32 (%class.D*)**
%memptr.virtualfn = load i32 (%class.D*)*, i32 (%class.D*)** %7, align 8
br label %memptr.end
memptr.nonvirtual: ; preds = %entry
%memptr.nonvirtualfn = inttoptr i64 %memptr.ptr to i32 (%class.D*)*
br label %memptr.end
memptr.end: ; preds = %memptr.nonvirtual, %memptr.virtual
%8 = phi i32 (%class.D*)* [ %memptr.virtualfn, %memptr.virtual ], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
%call = call i32 %8(%class.D* %this.adjusted)
ret i32 0
}
This is not entirely trivial to follow, but in essense:
%memptr.adj = Read adjustment from bfunc[1]
%2 = %d[%memptr.adj]
cast %2 to D*
%memptr.ptr = bfunc[0]
if (%memptr.ptr & 1) goto is_virtual else goto is_non_virtual
is_virtual:
%memptr.virtual=vtable[%memptr.ptr-1]
goto common
is_non_virtual:
%memptr.non_virtual = %memptr.ptr
common:
if we came from
is_non_virtual: %8 = %memptr.non_virtual
is_virtual: %8 = %memptr.virutal
call %8
I skipped some type-casts and stuff to make it simpler.
NOTE This is NOT meant to say "this is how it is implemented always. It's one example of what the compiler MAY do. Different compilers will do this subtly differently. But if the function may or may not be virtual, the compiler first has to figure out which. [In the above example, I'm fairly sure we can turn on optimisation and get much better code, but it would presumably just figure out exactly what's going on and remove all of the code, which for understanding how it works is pointless]
There is a very simple answer to this. Pointers-to-methods are not 'normal' pointers and can not be cast to those, even through reinterpret_cast. One can cast first to void*, and than to the long long, but this is really ill-advised.
Remember, size of pointer-to-method is not neccessarily (and usually is not!) equal to the size of 'normal' pointer. The way most compilers implement pointer-to-method, it is twice the size of 'normal' pointer.
GCC is going to complain for the pointer-to-method to void* cast in pedantic mode, but will generate code still.
From an llvm pass, I need to print an llvm instruction (Type llvm::Instruction) on the screen, just like as it appears in the llvm bitcode file. Actually my compilation is crashing, and does not reach the point where bitcode file is generated. So for debugging I want to print some instructions to know what is going wrong.
Assuming I is your instruction
I.print(errs());
By simply using the print method.
For a simple Hello World program, using C++'s range-based loops, you can do something like this:
for(auto& B: F){
for(auto& I: B){
errs() << I << "\n";
}
}
This gives the output:
%3 = alloca i32, align 4
%4 = alloca i8**, align 8
store i32 %0, i32* %3, align 4
store i8** %1, i8*** %4, align 8
%5 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* #.str, i64 0, i64 0))
ret i32 0
I have been trying to identify array access in IR by making use of following code:
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii++) {
Instruction *I=ii;
if(GetElementPtrInst *getElePntr = dyn_cast<GetElementPtrInst>(&*I))
{
Value *valAlloc = (getElePntr->getOperand(0));
if(getElePntr->getOperand(0)->getType()->isArrayTy())
{
errs()<<"\tarray found";
}
}
}
This code identifies getElementPtr instruction but it does not identify whether it's first operand is an array type or not. Please let me know what is the problem with my code.
The first operand of a GEP (getelementptr instruction) is a pointer, not an array. That pointer may point to an array, or it may not (see below). So you need to look what this pointer points to.
Here's a sample BasicBlockPass visitor:
virtual bool runOnBasicBlock(BasicBlock &BB) {
for (BasicBlock::iterator ii = BB.begin(), ii_e = BB.end(); ii != ii_e; ++ii) {
if (GetElementPtrInst *gep = dyn_cast<GetElementPtrInst>(&*ii)) {
// Dump the GEP instruction
gep->dump();
Value* firstOperand = gep->getOperand(0);
Type* type = firstOperand->getType();
// Figure out whether the first operand points to an array
if (PointerType *pointerType = dyn_cast<PointerType>(type)) {
Type* elementType = pointerType->getElementType();
errs() << "The element type is: " << *elementType << "\n";
if (elementType->isArrayTy()) {
errs() << " .. points to an array!\n";
}
}
}
}
return false;
}
Note, however, that many "arrays" in C/C++ are actually pointers so you may not get the array type where you expect.
For example, if you compile this code:
int main(int argc, char **argv) {
return (int)argv[1][8];
}
You get the IR:
define i32 #main(i32 %argc, i8** %argv) nounwind uwtable {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5
%7 = getelementptr inbounds i8* %6, i64 8
%8 = load i8* %7
%9 = sext i8 %8 to i32
ret i32 %9
}
Although argv is treated as an array, the compiler thinks of it as a pointer, so there is no array type in sight. The pass I pasted above won't recognize an array here, because the first operand of the GEP is a pointer to a pointer.