Using llvm do loop unrolling, failed at spliting block - llvm

I am doing llvm for some basic loop transformation practice.
The target loop I want to transform is as following:
int main ()
{
int j=0,i=0;
int x[100][5000] = {0};
for (i = 0; i < 100; i = i+1){
for (j = 0; j < 5000; j = j+1){
x[i][j] = 2 * x[i][j];
}
}
return 0;
}
The IR of this code has nested loop structure goes like:
entry:
....
for.cond:
....
for.body:
....
for.cond1:
....
for.body3:
....
for.inc:
....
for.end:
When I want to unroll the inner loop which is for.cond1, for.body3 and for.inc these three parts. I first split the for.body3 at the instruction branch, and then want to insert new unrolled block between them:
%3 = load i32* %j, align 4
%idxprom = sext i32 %3 to i64
%4 = load i32* %i, align 4
%idxprom4 = sext i32 %4 to i64
%arrayidx = getelementptr inbounds [5000 x [100 x i32]]* %x, i32 0, i64 %idxprom4
%arrayidx5 = getelementptr inbounds [100 x i32]* %arrayidx, i32 0, i64 %idxprom
%5 = load i32* %arrayidx5, align 4
%mul = mul nsw i32 2, %5
%6 = load i32* %j, align 4
%idxprom6 = sext i32 %6 to i64
%7 = load i32* %i, align 4
%idxprom7 = sext i32 %7 to i64
%arrayidx8 = getelementptr inbounds [5000 x [100 x i32]]* %x, i32 0, i64 %idxprom7
%arrayidx9 = getelementptr inbounds [100 x i32]* %arrayidx8, i32 0, i64 %idxprom6
store i32 %mul, i32* %arrayidx9, align 4
// insert new blocks here
br label %for.inc
But when I give the insrtuction in my pass, I got error.
my instruction:
BasicBlock *bb_new = bb->splitBasicBlock(inst_br);
the error:
Assertion (`HasInsideLoopSuccs && "Loop block has no in-loop successors!)
Can anyone familiar with llvm tell me what is the problem? Or do I have other ways to split the block and insert unroll ones?
The HasInsideLoopSuccs is being set at the following LoopInfoImpl.h
// Check the individual blocks.
for ( ; BI != BE; ++BI) {
BlockT *BB = *BI;
bool HasInsideLoopSuccs = false;
bool HasInsideLoopPreds = false;
SmallVector<BlockT *, 2> OutsideLoopPreds;
typedef GraphTraits<BlockT*> BlockTraits;
for (typename BlockTraits::ChildIteratorType SI =
BlockTraits::child_begin(BB), SE = BlockTraits::child_end(BB);
SI != SE; ++SI)
if (contains(*SI)) {
HasInsideLoopSuccs = true;
break;
}
11/26 add: This is my pass.
class IndependentUnroll : public llvm::LoopPass
{
public:
virtual void unroll(llvm::Loop *L){
for (Loop::block_iterator block = L->block_begin(); block !=
L->block_end(); block++) {
BasicBlock *bb = *block;
/* Handle loop body. */
if (string(bb->getName()).find("for.body3") !=string::npos) {
Instruction *inst = &bb->back();
BasicBlock *new_bb = bb->splitBasicBlock(inst);
/*Then the code get crashed!*/
}
}
}
IndependentUnroll() : llvm::LoopPass(IndependentUnroll::ID) { }
virtual bool runOnLoop(llvm::Loop *L, llvm::LPPassManager &LPM) {
if (L->getLoopDepth() == 1){
unroll(L);
}
}
static char ID;
};

Your inner for-loop is:
for (j = 0; j < 5000; j = i+1) {
Do you mean for the increment to be:
j = j + 1
To avoid the infinite loop?

Related

Why std::unique_ptr isn't optimized while std::variant can? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I tried to compare the overhead of std::visit(std::variant polymorphism) and virtual function(std::unique_ptr polymorphism).(please note my question is not about overhead or performance, but optimization.)
Here is my code.
https://quick-bench.com/q/pJWzmPlLdpjS5BvrtMb5hUWaPf0
#include <memory>
#include <variant>
struct Base
{
virtual void Process() = 0;
};
struct Derived : public Base
{
void Process() { ++a; }
int a = 0;
};
struct VarDerived
{
void Process() { ++a; }
int a = 0;
};
static std::unique_ptr<Base> ptr;
static std::variant<VarDerived> var;
static void PointerPolyMorphism(benchmark::State& state)
{
ptr = std::make_unique<Derived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
ptr->Process();
}
}
BENCHMARK(PointerPolyMorphism);
static void VariantPolyMorphism(benchmark::State& state)
{
var.emplace<VarDerived>();
for (auto _ : state)
{
for(int i = 0; i < 1000000; ++i)
std::visit([](auto&& x) { x.Process();}, var);
}
}
BENCHMARK(VariantPolyMorphism);
I know it's not good benchmark test, it was only draft during my test.
But I was surprised at the result.
std::visit benchmark was high(which means slow) without any optimization.
But When I turn on optimization (higher than O2), std::visit benchmark is extremely low(which means extremely fast) while std::unique_ptr isn't.
I'm wondering why the same optimization can't be applied to the std::unique_ptr polymorphism?
I've compiled your code with Clang++ to LLVM (without your benchmarking) with -Ofast. Here's what you get for VariantPolyMorphism, unsurprisingly:
define void #_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
ret void
}
On the other hand, PointerPolyMorphism does really execute the loop and all calls:
define void #_Z19PointerPolyMorphismv() local_unnamed_addr #2 personality i32 (...)* #__gxx_personality_v0 {
%1 = tail call dereferenceable(16) i8* #_Znwm(i64 16) #8, !noalias !8
tail call void #llvm.memset.p0i8.i64(i8* nonnull align 16 dereferenceable(16) %1, i8 0, i64 16, i1 false), !noalias !8
%2 = bitcast i8* %1 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* #_ZTV7Derived, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %2, align 8, !tbaa !11, !noalias !8
%3 = getelementptr inbounds i8, i8* %1, i64 8
%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 8, !tbaa !13, !noalias !8
%5 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* #_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
store i8* %1, i8** bitcast ({ { %struct.Base* } }* #_ZL3ptr to i8**), align 8, !tbaa !4
%6 = icmp eq %struct.Base* %5, null
br i1 %6, label %7, label %8
7: ; preds = %8, %0
br label %11
8: ; preds = %0
%9 = bitcast %struct.Base* %5 to i8*
tail call void #_ZdlPv(i8* %9) #7
br label %7
10: ; preds = %11
ret void
11: ; preds = %7, %11
%12 = phi i32 [ %17, %11 ], [ 0, %7 ]
%13 = load %struct.Base*, %struct.Base** getelementptr inbounds ({ { %struct.Base* } }, { { %struct.Base* } }* #_ZL3ptr, i64 0, i32 0, i32 0), align 8, !tbaa !4
%14 = bitcast %struct.Base* %13 to void (%struct.Base*)***
%15 = load void (%struct.Base*)**, void (%struct.Base*)*** %14, align 8, !tbaa !11
%16 = load void (%struct.Base*)*, void (%struct.Base*)** %15, align 8
tail call void %16(%struct.Base* %13)
%17 = add nuw nsw i32 %12, 1
%18 = icmp eq i32 %17, 1000000
br i1 %18, label %10, label %11
}
The reason for this is that both your variables are static. This allows the compiler to infer that no code outside the translation unit has access to your variant instance. Therefore your loop doesn't have any visible effect and can be safely removed. However, although your smart pointer is static, the memory it points to could still change (as a side-effect of the call to Process, for example). The compiler can therefore not easily prove that is safe to remove the loop and doesn't.
If you remove the static from both VariantPolyMorphism you get:
define void #_Z19VariantPolyMorphismv() local_unnamed_addr #2 {
store i32 0, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* #var, i64 0, i32 0, i32 1), align 4, !tbaa !16
store i32 1000000, i32* getelementptr inbounds ({ { %"union.std::__1::__variant_detail::__union", i32 } }, { { %"union.std::__1::__variant_detail::__union", i32 } }* #var, i64 0, i32 0, i32 0, i32 0, i32 0, i32 0), align 4, !tbaa !18
ret void
}
Which isn't surprising once again. The variant can only contain VarDerived so nothing needs to be computed at run-time: The final state of the variant can already be determined at compile-time. The difference, though, now is that some other translation unit might want to access the value of var later on and the value must therefore be written.
Your variant can store only singe type, so this is same as single regular variable (it is working more like an optional).
You are running test without optimizations enabled
Result is not secured from optimizer so it can trash your code.
Your code actually do not utilizes polymorphism, some compilers are able to figure out that there is only one implementation of Base class and drop virtual calls.
This is better but still not trustworthy:
ver 1, ver 2 with arrays.
Yes polymorphism can be expensive when used in tight loops.
Witting benchmarks for such small extremely fast features is hard and full of pitfalls, so must be approached with extreme caution, since you reaching limitations of benchmark tool.

how to see content of a method pointer?

typedef int (D::*fptr)(void);
fptr bfunc;
bfunc=&D::Bfunc;
cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
complete code available at : https://ideone.com/wRVyTu
I am trying to use reinterpret_cast, but the compiler throws error
prog.cpp: In function 'int main()': prog.cpp:49:51: error: invalid cast from type 'fptr {aka int (D::*)()}' to type 'long long unsigned int' cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
My questions are :
why is reinterpret_cast not suitable for this occasion?
Is there another way, I can see the contents of the method pointer?
Using clang++ to compile a slightly modified version of your code (removed all the cout to not get thousands of lines...), we get this for main:
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
ret i32 0
}
Note that the bfunc and dfunc are two 64-bit integer values. If I compile for 32-bit x86 it is two i32 (so 32-bit integer values).
So, if we make main look like this:
int main() {
// your code goes here
typedef int (D::*fptr)(void);
fptr bfunc;
fptr dfunc;
bfunc=&D::Bfunc;
dfunc=&D::Dfunc;
D d;
(d.*bfunc)();
return 0;
}
the generated code looks like this:
; Function Attrs: norecurse uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
%d = alloca %class.D, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
call void #_ZN1DC2Ev(%class.D* %d) #3
%0 = load { i64, i64 }, { i64, i64 }* %bfunc, align 8
%memptr.adj = extractvalue { i64, i64 } %0, 1
%1 = bitcast %class.D* %d to i8*
%2 = getelementptr inbounds i8, i8* %1, i64 %memptr.adj
%this.adjusted = bitcast i8* %2 to %class.D*
%memptr.ptr = extractvalue { i64, i64 } %0, 0
%3 = and i64 %memptr.ptr, 1
%memptr.isvirtual = icmp ne i64 %3, 0
br i1 %memptr.isvirtual, label %memptr.virtual, label %memptr.nonvirtual
memptr.virtual: ; preds = %entry
%4 = bitcast %class.D* %this.adjusted to i8**
%vtable = load i8*, i8** %4, align 8
%5 = sub i64 %memptr.ptr, 1
%6 = getelementptr i8, i8* %vtable, i64 %5
%7 = bitcast i8* %6 to i32 (%class.D*)**
%memptr.virtualfn = load i32 (%class.D*)*, i32 (%class.D*)** %7, align 8
br label %memptr.end
memptr.nonvirtual: ; preds = %entry
%memptr.nonvirtualfn = inttoptr i64 %memptr.ptr to i32 (%class.D*)*
br label %memptr.end
memptr.end: ; preds = %memptr.nonvirtual, %memptr.virtual
%8 = phi i32 (%class.D*)* [ %memptr.virtualfn, %memptr.virtual ], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
%call = call i32 %8(%class.D* %this.adjusted)
ret i32 0
}
This is not entirely trivial to follow, but in essense:
%memptr.adj = Read adjustment from bfunc[1]
%2 = %d[%memptr.adj]
cast %2 to D*
%memptr.ptr = bfunc[0]
if (%memptr.ptr & 1) goto is_virtual else goto is_non_virtual
is_virtual:
%memptr.virtual=vtable[%memptr.ptr-1]
goto common
is_non_virtual:
%memptr.non_virtual = %memptr.ptr
common:
if we came from
is_non_virtual: %8 = %memptr.non_virtual
is_virtual: %8 = %memptr.virutal
call %8
I skipped some type-casts and stuff to make it simpler.
NOTE This is NOT meant to say "this is how it is implemented always. It's one example of what the compiler MAY do. Different compilers will do this subtly differently. But if the function may or may not be virtual, the compiler first has to figure out which. [In the above example, I'm fairly sure we can turn on optimisation and get much better code, but it would presumably just figure out exactly what's going on and remove all of the code, which for understanding how it works is pointless]
There is a very simple answer to this. Pointers-to-methods are not 'normal' pointers and can not be cast to those, even through reinterpret_cast. One can cast first to void*, and than to the long long, but this is really ill-advised.
Remember, size of pointer-to-method is not neccessarily (and usually is not!) equal to the size of 'normal' pointer. The way most compilers implement pointer-to-method, it is twice the size of 'normal' pointer.
GCC is going to complain for the pointer-to-method to void* cast in pedantic mode, but will generate code still.

Clang sizeof("literal") optimization

Experiencing with C++, I tried to understand the performance difference between sizeof and strlen for string literal.
Here my small benchmark code:
#include <iostream>
#include <cstring>
#define LOOP_COUNT 1000000000
unsigned long long rdtscl(void)
{
unsigned int lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
int main()
{
unsigned long long before = rdtscl();
size_t ret;
for (int i = 0; i < LOOP_COUNT; i++)
ret = strlen("abcd");
unsigned long long after = rdtscl();
std::cout << "Strlen " << (after - before) << " ret=" << ret << std::endl;
before = rdtscl();
for (int i = 0; i < LOOP_COUNT; i++)
ret = sizeof("abcd");
after = rdtscl();
std::cout << "Sizeof " << (after - before) << " ret=" << ret << std::endl;
}
Compiling with clang++, I get the following result:
clang++ -O3 -Wall -o sizeof_vs_strlen sizeof_vs_strlen.cpp
./sizeof_vs_strlen
Strlen 36 ret=4
Sizeof 62092396 ret=5
With g++:
g++ -O3 -Wall -o sizeof_vs_strlen sizeof_vs_strlen.cpp
./sizeof_vs_strlen
Strlen 30 ret=4
Sizeof 30 ret=5
I strongly suspect that g++ does optimize the loop with sizeof and clang++ don't.
Is this result a known issue?
EDIT:
The assembly generated by clang++ for the loop with sizeof:
rdtsc
mov %edx,%r14d
shl $0x20,%r14
mov $0x3b9aca01,%ecx
xchg %ax,%ax
add $0xffffffed,%ecx // 0x400ad0
jne 0x400ad0 <main+192>
mov %eax,%eax
or %rax,%r14
rdtsc
And the one by g++:
rdtsc
mov %edx,%esi
mov %eax,%ecx
rdtsc
I don't understand why clang++ do the {add, jne} loop, it seems useless. Is it a bug?
For information:
g++ (GCC) 5.1.0
clang version 3.6.2 (tags/RELEASE_362/final)
EDIT2:
It it likely to be a bug in clang.
I opened a bug report.
I'd call that a bug in clang.
It's actually optimising the sizeof itself, just not the loop.
To make the code much clearer, I change the std::cout to printf, and then you get the following LLVM-IR code for main:
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
entry:
%0 = tail call { i32, i32 } asm sideeffect "rdtsc", "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
%asmresult1.i = extractvalue { i32, i32 } %0, 1
%conv2.i = zext i32 %asmresult1.i to i64
%shl.i = shl nuw i64 %conv2.i, 32
%asmresult.i = extractvalue { i32, i32 } %0, 0
%conv.i = zext i32 %asmresult.i to i64
%or.i = or i64 %shl.i, %conv.i
%1 = tail call { i32, i32 } asm sideeffect "rdtsc", "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
%asmresult.i.25 = extractvalue { i32, i32 } %1, 0
%asmresult1.i.26 = extractvalue { i32, i32 } %1, 1
%conv.i.27 = zext i32 %asmresult.i.25 to i64
%conv2.i.28 = zext i32 %asmresult1.i.26 to i64
%shl.i.29 = shl nuw i64 %conv2.i.28, 32
%or.i.30 = or i64 %shl.i.29, %conv.i.27
%sub = sub i64 %or.i.30, %or.i
%call2 = tail call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([21 x i8], [21 x i8]* #.str, i64 0, i64 0), i64 %sub, i64 4)
%2 = tail call { i32, i32 } asm sideeffect "rdtsc", "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
%asmresult1.i.32 = extractvalue { i32, i32 } %2, 1
%conv2.i.34 = zext i32 %asmresult1.i.32 to i64
%shl.i.35 = shl nuw i64 %conv2.i.34, 32
br label %for.cond.5
for.cond.5: ; preds = %for.cond.5, %entry
%i4.0 = phi i32 [ 0, %entry ], [ %inc10.18, %for.cond.5 ]
%inc10.18 = add nsw i32 %i4.0, 19
%exitcond.18 = icmp eq i32 %inc10.18, 1000000001
br i1 %exitcond.18, label %for.cond.cleanup.7, label %for.cond.5
for.cond.cleanup.7: ; preds = %for.cond.5
%asmresult.i.31 = extractvalue { i32, i32 } %2, 0
%conv.i.33 = zext i32 %asmresult.i.31 to i64
%or.i.36 = or i64 %shl.i.35, %conv.i.33
%3 = tail call { i32, i32 } asm sideeffect "rdtsc", "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
%asmresult.i.37 = extractvalue { i32, i32 } %3, 0
%asmresult1.i.38 = extractvalue { i32, i32 } %3, 1
%conv.i.39 = zext i32 %asmresult.i.37 to i64
%conv2.i.40 = zext i32 %asmresult1.i.38 to i64
%shl.i.41 = shl nuw i64 %conv2.i.40, 32
%or.i.42 = or i64 %shl.i.41, %conv.i.39
%sub13 = sub i64 %or.i.42, %or.i.36
%call14 = tail call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([21 x i8], [21 x i8]* #.str, i64 0, i64 0), i64 %sub13, i64 5)
ret i32 0
}
As you can see, the call to printf is using the constant 5 from sizeof, and the for.cond.5: starts the empty loop:
a "phi" node (which selects the "new" value of i based on where we came from - before loop -> 0, inside loop -> %inc10.18)
an increment
a conditional branch that jumps back if %inc10.18 is not 100000001.
I don't know enough about clang and LLVM to explain why that empty loop isn't optimised away. But it's certainly not the sizeof that is taking time, as there is no sizeof inside the loop.
It's worth noting that sizeof is ALWAYS a constant at compile-time, it NEVER "takes time" beyond loading a constant value into a register.
The difference is, sizeof() is not a function call. The value, "returned" by sizeof() is known at compile-time.
At the same moment, strlen is function call (executed at run-time, apparently) and is not optimized at all. It seeks '\0' in a string, and it doesn't even have a single clue if it is a dynamically allocated string or a string literal.
So, sizeof is expected to be always faster for string literals.
I am not an expert, but your results might be explained by scheduling algorithms or overflow in your time variables.

LLVM Can't find getelementptr instruction

I have this byte code fragment:
define void #setGlobal(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32* %a.addr, align 4
store i32 %0, i32* #Global, align 4
%1 = load i32* %a.addr, align 4
store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4
store i32 2, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 2), align 4
ret void
}
I am using this code to find the getelementptr from "store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4":
for (Module::iterator F = p_Module.begin(), endF = p_Module.end(); F != endF; ++F) {
for (Function::iterator BB = F->begin(), endBB = F->end(); BB != endBB; ++BB) {
for (BasicBlock::iterator I = BB->begin(), endI = BB->end(); I
!= endI; ++I) {
if (StoreInst* SI = dyn_cast<StoreInst>(I)) {
if (Instruction *gep = dyn_cast<Instruction>(SI->getOperand(1)))
{
if (gep->getOpcode() == Instruction::GetElementPtr)
{
//do something
}
}
}
}
}
}
This code can't find the getelementptr. What am I doing wrong?
There are no getelementptr instructions in your bitcode snippet, which is why you can't find them.
The two cases that look like a getelementptr instructions are actually constant expressions - the telltale sign is that they appear as part of another instruction (store), which is not something you can do with regular instructions.
So if you want to search for that expression, you need to look for type GetElementPtrConstantExpr, not GetElementPtrInst.

Identify array type in IR

I have been trying to identify array access in IR by making use of following code:
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii++) {
Instruction *I=ii;
if(GetElementPtrInst *getElePntr = dyn_cast<GetElementPtrInst>(&*I))
{
Value *valAlloc = (getElePntr->getOperand(0));
if(getElePntr->getOperand(0)->getType()->isArrayTy())
{
errs()<<"\tarray found";
}
}
}
This code identifies getElementPtr instruction but it does not identify whether it's first operand is an array type or not. Please let me know what is the problem with my code.
The first operand of a GEP (getelementptr instruction) is a pointer, not an array. That pointer may point to an array, or it may not (see below). So you need to look what this pointer points to.
Here's a sample BasicBlockPass visitor:
virtual bool runOnBasicBlock(BasicBlock &BB) {
for (BasicBlock::iterator ii = BB.begin(), ii_e = BB.end(); ii != ii_e; ++ii) {
if (GetElementPtrInst *gep = dyn_cast<GetElementPtrInst>(&*ii)) {
// Dump the GEP instruction
gep->dump();
Value* firstOperand = gep->getOperand(0);
Type* type = firstOperand->getType();
// Figure out whether the first operand points to an array
if (PointerType *pointerType = dyn_cast<PointerType>(type)) {
Type* elementType = pointerType->getElementType();
errs() << "The element type is: " << *elementType << "\n";
if (elementType->isArrayTy()) {
errs() << " .. points to an array!\n";
}
}
}
}
return false;
}
Note, however, that many "arrays" in C/C++ are actually pointers so you may not get the array type where you expect.
For example, if you compile this code:
int main(int argc, char **argv) {
return (int)argv[1][8];
}
You get the IR:
define i32 #main(i32 %argc, i8** %argv) nounwind uwtable {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5
%7 = getelementptr inbounds i8* %6, i64 8
%8 = load i8* %7
%9 = sext i8 %8 to i32
ret i32 %9
}
Although argv is treated as an array, the compiler thinks of it as a pointer, so there is no array type in sight. The pass I pasted above won't recognize an array here, because the first operand of the GEP is a pointer to a pointer.