How to determine if a function parameter is annotated? - llvm

I'm annotating function parameters as shown below with the label bar.
int foo (char* s __attribute__((annotate("bar")))) {
...
}
Next, I am running a function pass. How can I determine if a given function argument is annotated with the label bar?

You will have to read the llvm.var.annotation and llvm.dbg.declare intrinsics.
More specifically, here is the llvm-ir generated by your code above:
#.str = private unnamed_addr constant [4 x i8] c"bar\00", section "llvm.metadata"
#.str.1 = private unnamed_addr constant [75 x i8] c"/tmp/compiler-explorer-compiler117030-12962-1rhu4lb.ojfaiz4cxr/example.cpp\00", section "llvm.metadata"
; Function Attrs: nounwind uwtable
define i32 #foo(char*)(i8*) #0 !dbg !6 {
%2 = alloca i8*, align 8
store i8* %0, i8** %2, align 8
call void #llvm.dbg.declare(metadata i8** %2, metadata !12, metadata !13), !dbg !14
%3 = bitcast i8** %2 to i8*
call void #llvm.var.annotation(i8* %3, i8* getelementptr inbounds ([4 x i8], [4 x i8]* #.str, i32 0, i32 0), i8* getelementptr inbounds ([75 x i8], [75 x i8]* #.str.1, i32 0, i32 0), i32 1)
ret i32 0, !dbg !15
}
!6 = distinct !DISubprogram(name: "foo", linkageName: "foo(char*)", scope: !1, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
!7 = !DISubroutineType(types: !8)
!8 = !{!9, !10}
!9 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
!10 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64, align: 64)
!11 = !DIBasicType(name: "char", size: 8, align: 8, encoding: DW_ATE_signed_char)
!12 = !DILocalVariable(name: "s", arg: 1, scope: !6, file: !1, line: 1, type: !10)
The dbg.declare instruction tells you that %2 is actually the first parameter of the function (named s).
%3 is a bitcast of %2, so basically an alias.
And the the llvm.var.annotation instruction tells you that %2 is annotated with the constant string #str, which value is "bar".

Related

How to re-arragne LLVM GEP instructions?

I have LLVM IR like below :
for.body: ; preds = %for.cond
%add = add nsw i32 %i.0, 3
%idxprom = sext i32 %add to i64
%arrayidx = getelementptr inbounds i32, i32* %arr, i64 %idxprom
%0 = load i32, i32* %arrayidx, align 4
%add1 = add nsw i32 %sum1.0, %0
%add2 = add nsw i32 %i.0, 2
%idxprom3 = sext i32 %add2 to i64
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%1 = load i32, i32* %arrayidx4, align 4
%add5 = add nsw i32 %sum2.0, %1
%add6 = add nsw i32 %i.0, 1
%idxprom7 = sext i32 %add6 to i64
%arrayidx8 = getelementptr inbounds i32, i32* %arr, i64 %idxprom7
%2 = load i32, i32* %arrayidx8, align 4
%add9 = add nsw i32 %sum3.0, %2
%idxprom10 = sext i32 %i.0 to i64
%arrayidx11 = getelementptr inbounds i32, i32* %arr, i64 %idxprom10
%3 = load i32, i32* %arrayidx11, align 4
%add12 = add nsw i32 %sum4.0, %3
br label %for.inc
I want to re-arrang GEP instructions above. It should be arranged like below for this example :
%arrayidx11 = getelementptr inbounds i32, i32* %arr, i64 %idxprom10
%arrayidx8 = getelementptr inbounds i32, i32* %arr, i64 %idxprom7
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%arrayidx = getelementptr inbounds i32, i32* %arr, i64 %idxprom
I know that even the uses of array access has to be moved after this arrangement. So I am trying to get use-chain for each GEP instruction using below code :
// Get all the use chain instructions
for (Value::use_iterator i = inst1->use_begin(),e = inst1->use_end(); i!=e;++i) {
dyn_cast<Instruction>(*i)->dump();
}
But I am getting only the declaration instruction with this code, I was expecting to get all the below instructions for %arrayidx4 :
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%1 = load i32, i32* %arrayidx4, align 4
Please help me out here. Thanks in advance.
I don't really like this question, but I should be doing paperwork for my taxes today...
Your first task is to find the GEPs and sort them into the order you want. When doing this, you need a separate list. LLVM's BasicBlock class does provide a list, but as a general rule, never modify that list while you're iterating over it. That's permitted but too error-prone.
So at the start:
std::vector<GetElementPtr *> geps;
for(auto & i : block->getInstList())
if(GetElementPtrInst * g = dyn_cast<GetElementPTrInst>(&i))
geps.push_back(g);
You can use any container class, your project's code standard will probably suggest using either std::whatever or an LLVM class.
Next, sort geps into the order you prefer. I leave that part out.
After that, move each GEP to the latest permissible point in the block. Which point is that? Well, if the block was valid, then each GEP is already after the values it uses and before the instructions that use it, so moving it to a possibly later point while keeping it before its users will do.
for(auto g : geps) {
Instruction * firstUser = nullptr;
for(auto u : g->users()) {
Instruction * i = dyn_cast<Instruction>(u);
if(i &&
i->getParent() == g->getParent() &&
(!firstUser ||
i->comesBefore(firstUser)))
firstUser = i;
}
}
if(firstUser)
g->moveBefore(firstUser);
}
For each user, check that it is an instruction within the same basic block, and if it is so, check whether it's earlier in the block than the other users seen so far. Finally, move the GEP.
You may prefer a different approach. Several are possible. For example, you could reorder the GEPs after sorting them (using moveAfter() to move each GEP after the previous one) and then use a combination of users() and moveAfter() to make sure all users are after the instructions they use.
for(auto u : foo->users))) {
Instruction * i = dyn_cast<Instruction>(u);
if(i &&
i->getParent() == foo->getParent() &&
i->comesBefore(foo))
i->moveAfter(foo);
}
Note again that this code never modifies the basic block's list while iterating over it. If you have any mysterious errors, check that first.

Identify annotated variable in an LLVM pass

How can I identify an annotated variable in an LLVM pass?
#include <stdio.h>
int main (){
int x __attribute__((annotate("my_var")))= 0;
int a,b;
x = x + 1;
a = 5;
b = 6;
x = x + a;
return x;
}
For example, I want to identify the instructions which have the annotated variable (x in this case) and print them out (x = x+1; and x = x+a)
How can I achieve this?
This is the .ll file generated using LLVM
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64"
#.str = private unnamed_addr constant [7 x i8] c"my_var\00", section "llvm.metadata"
#.str.1 = private unnamed_addr constant [7 x i8] c"test.c\00", section "llvm.metadata"
; Function Attrs: noinline nounwind optnone
define i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 0, i32* %1, align 4
%5 = bitcast i32* %2 to i8*
call void #llvm.var.annotation(i8* %5, i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
store i32 0, i32* %2, align 4
%6 = load i32, i32* %2, align 4
%7 = add nsw i32 %6, 1
store i32 %7, i32* %2, align 4
store i32 5, i32* %3, align 4
store i32 6, i32* %4, align 4
%8 = load i32, i32* %2, align 4
%9 = load i32, i32* %3, align 4
%10 = add nsw i32 %8, %9
store i32 %10, i32* %2, align 4
%11 = load i32, i32* %2, align 4
ret i32 %11
}
; Function Attrs: nounwind
declare void #llvm.var.annotation(i8*, i8*, i8*, i32) #1
attributes #0 = { noinline nounwind optnone "correctly-rounded-divide-sqrt-fp-math"="false" $
attributes #1 = { nounwind }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
I recently encountered similiary problem, as I searched Google still not found a solution.
But in the end , I found "ollvm" project's Utils.cpp ,it solved my problem.
In your case,
%5 = bitcast i32* %2 to i8*
call void #llvm.var.annotation(i8* %5, i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
as we can see there is a call to #llvm.var.annotation , in our pass ,
we can loop through instructions over a function , and search for "call" instruction.
Then get the called function's name:
Function *fn = callInst->getCalledFunction();
StringRef fn_name = fn->getName();
and compare the called function's name with "llvm.var.annotation" .
If they match ,then we found the location of "int x " in your case .
The function "llvm.var.annotation" is documented in llvm's doc :
http://llvm.org/docs/LangRef.html#llvm-var-annotation-intrinsic
If you have learn the function "llvm.var.annotation"'s prototype,
then you know that it's second argument is a pointer ,the pointer
points to "my_var\00" in your case . If you thought you can simply
convert it to a GlobalVariable ,then you will failed to get what
you wanted . The actual second argument passed to "llvm.var.annotation"
is
i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
in your case.
It's a expression but a GlobalVariable !!! By knowing this , we can
finally get the annotation of our target variable by :
ConstantExpr *ce =
cast<ConstantExpr>(callInst->getOperand(1));
if (ce) {
if (ce->getOpcode() == Instruction::GetElementPtr) {
if (GlobalVariable *annoteStr =
dyn_cast<GlobalVariable>(ce->getOperand(0))) {
if (ConstantDataSequential *data =
dyn_cast<ConstantDataSequential>(
annoteStr->getInitializer())) {
if (data->isString()) {
errs() << "Found data " << data->getAsString();
}
}
}
}
Hope you already solved the problem .
Have a nice day .
You have to loop on instructions and identify calls to llvm.var.annotation
First argument is a pointer to the annotated variable (i8*).
To get the actual annotated variable, you then need to find what this pointer points to.
In your case, this is the source operand of the bitcast instruction.

clang can't optimize away global variables used only in main()?

If I plug this c++ program into clang (version 3.7)
///*
#include "stdio.h"
#include "stdint.h"
//extern int printf(const unsigned char*, ...);
extern "C" void __cxa_pure_virtual() { }
struct A
{
virtual void foo() = 0;
};
struct B : A
{
uint32_t x;
B(int x) : x(x) { }
virtual void foo()
{
printf("This is a test %d\n", x);
}
};
//*/
uint64_t thing = 0;
float other = 10.0f;
B b(12345);
int main()
{
thing++;
A* a = &b;
other *= 3.14159f;
a->foo();
}
And compile with clang -emit-llvm main.cpp -fno-rtti -O3 -S, then I get the following byte code:
; ModuleID = 'main.cpp'
target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i686-pc-linux-gnu"
%struct.B = type { %struct.A, i32 }
%struct.A = type { i32 (...)** }
$_ZN1B3fooEv = comdat any
$_ZTV1B = comdat any
#thing = global i64 0, align 8
#other = global float 1.000000e+01, align 4
#b = global %struct.B { %struct.A { i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*], [3 x i8*]* #_ZTV1B, i64 0, i64 2) to i32 (...)**) }, i32 12345 }, align 4
#_ZTV1B = linkonce_odr unnamed_addr constant [3 x i8*] [i8* null, i8* null, i8* bitcast (void (%struct.B*)* #_ZN1B3fooEv to i8*)], comdat, align 4
#.str = private unnamed_addr constant [19 x i8] c"This is a test %d\0A\00", align 1
#llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer
; Function Attrs: nounwind readnone
define void #__cxa_pure_virtual() #0 {
entry:
ret void
}
define i32 #main() #1 {
entry:
%0 = load i64, i64* #thing, align 8, !tbaa !1
%inc = add i64 %0, 1
store i64 %inc, i64* #thing, align 8, !tbaa !1
%1 = load float, float* #other, align 4, !tbaa !5
%mul = fmul float %1, 0x400921FA00000000
store float %mul, float* #other, align 4, !tbaa !5
%vtable = load void (%struct.A*)**, void (%struct.A*)*** bitcast (%struct.B* #b to void (%struct.A*)***), align 4, !tbaa !7
%2 = load void (%struct.A*)*, void (%struct.A*)** %vtable, align 4
tail call void %2(%struct.A* getelementptr inbounds (%struct.B, %struct.B* #b, i32 0, i32 0))
ret i32 0
}
; Function Attrs: nounwind
define linkonce_odr void #_ZN1B3fooEv(%struct.B* nocapture readonly %this) unnamed_addr #2 comdat align 2 {
entry:
%x = getelementptr inbounds %struct.B, %struct.B* %this, i32 0, i32 1
%0 = load i32, i32* %x, align 4, !tbaa !9
%call = tail call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([19 x i8], [19 x i8]* #.str, i32 0, i32 0), i32 %0)
ret void
}
; Function Attrs: nounwind
declare i32 #printf(i8* nocapture readonly, ...) #2
attributes #0 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="pentium4" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="pentium4" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="pentium4" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 3.7.1 "}
!1 = !{!2, !2, i64 0}
!2 = !{!"long long", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
!5 = !{!6, !6, i64 0}
!6 = !{!"float", !3, i64 0}
!7 = !{!8, !8, i64 0}
!8 = !{!"vtable pointer", !4, i64 0}
!9 = !{!10, !11, i64 4}
!10 = !{!"_ZTS1B", !11, i64 4}
!11 = !{!"int", !3, i64 0}
If you look at the main function, I have two variables that are useless. Sure I increment one and I do some multiplication on another, but I never use the values in them ever.
But if you look at the output of the byte code, it looks like it is still doing the useless math.
Is it just me or is this a bug?
Those variables are variables in global scope. The compiler simply couldn't figure out whether or not those variables could be declared and referenced in other translation units.
I'd be surprised if any modern C++ compiler is sophisticated enough to figure out that execution flow could not escape this translation unit, in a defined manner, and thus it would be safe to optimize away unused global variables in this translation unit.
No, I don't believe that this is a bug, as your variables are globals.
Clang cannot remove this math as it can't know that any externally called function (like the printf function, in a different translation unit) doesn't declare extern float other; and somehow uses it.
Try writing:
int main()
{
uint64_t thing = 0;
float other = 10.0f;
B b(12345);
thing++;
A* a = &b;
other *= 3.14159f;
a->foo();
}

llvm get annotations

I updated my previous question under a new form.
Hello everyone,
I have the following LLVM IR :
#.str = private unnamed_addr constant [3 x i8] c"DS\00", section "llvm.metadata"
#llvm.global.annotations = appending global [1 x { i8*, i8*, i8*, i32 }] [{ i8*, i8*, i8*, i32 } { i8* bitcast (i32* #f to i8*), i8* getelementptr inbounds ([3 x i8]* #.str, i32 0, i32 0), i8* getelementptr inbounds ([9 x i8]* #.str1, i32 0, i32 0), i32 18 }], section "llvm.metadata"
I need to get #f (or maybe I can get somehow the definition of #f = global i32 0, align 4 ) and also I need to get "DS" from #.str. In my target code I have :
__attribute__((annotate("DS"))) int f=0;
I have problems to parse #llvm.global.annotations and I assume I will have with #.str. What I tried:
1.
for (Module::global_iterator I = F.global_begin(), E = F.global_end(); I != E; ++I) {
if (I->getName() == "llvm.global.annotations") {
Value *V = cast<Value>(I->getOperand(0));
errs()<<"\n "<<*(V)<<"\n";
errs()<<"\n "<<*(V->getType())<<"\n";
RESULT :
[1 x { i8*, i8*, i8*, i32 }] [{ i8*, i8*, i8*, i32 } { i8* bitcast (i32* #f to i8*), i8* getelementptr inbounds ([3 x i8]* #.str, i32 0, i32 0), i8* getelementptr inbounds ([9 x i8]* #.str1, i32 0, i32 0), i32 18 }]
[1 x { i8*, i8*, i8*, i32 }]
2.
errs()<<"\n "<<(V->getValueID())<<"\n";
if(V->getValueID() == Value::ConstantArrayVal) {
ConstantArray *ca = (ConstantArray *)V;
errs()<<"\n "<<(ca[0])<<"\n"; }
RESULT :
[1 x { i8*, i8*, i8*, i32 }] [{ i8*, i8*, i8*, i32 } { i8* bitcast (i32* #f to i8*), i8* getelementptr inbounds ([3 x i8]* #.str, i32 0, i32 0), i8* getelementptr inbounds ([9 x i8]* #.str1, i32 0, i32 0), i32 18 }]
Any help is welcomed ! Thank you !
Quite a late answer, but Google led me here and I thought that providing a full LLVM pass that discovers free text annotation would be helpful.
This LLVM pass would instrument only function marked with __attribute((annotate("someFreeTextAnnotation"))).
The code follows:
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Constants.h"
#include <set>
using namespace llvm;
const char *AnnotationString = "someFreeTextAnnotation";
namespace {
struct Hello : public FunctionPass {
static char ID;
Hello() : FunctionPass(ID) {}
std::set<Function*> annotFuncs;
virtual bool doInitialization(Module &M)override{
getAnnotatedFunctions(&M);
return false;
}
bool shouldInstrumentFunc(Function &F){
return annotFuncs.find(&F)!=annotFuncs.end();
}
void getAnnotatedFunctions(Module *M){
for (Module::global_iterator I = M->global_begin(),
E = M->global_end();
I != E;
++I) {
if (I->getName() == "llvm.global.annotations") {
ConstantArray *CA = dyn_cast<ConstantArray>(I->getOperand(0));
for(auto OI = CA->op_begin(); OI != CA->op_end(); ++OI){
ConstantStruct *CS = dyn_cast<ConstantStruct>(OI->get());
Function *FUNC = dyn_cast<Function>(CS->getOperand(0)->getOperand(0));
GlobalVariable *AnnotationGL = dyn_cast<GlobalVariable>(CS->getOperand(1)->getOperand(0));
StringRef annotation = dyn_cast<ConstantDataArray>(AnnotationGL->getInitializer())->getAsCString();
if(annotation.compare(AnnotationString)==0){
annotFuncs.insert(FUNC);
//errs() << "Found annotated function " << FUNC->getName()<<"\n";
}
}
}
}
}
bool runOnFunction(Function &F) override {
if(shouldInstrumentFunc(F)==false)
return false;
errs() << "Instrumenting " << F.getName() << "\n";
return false;
}
}; // end of struct Hello
} // end of anonymous namespace
char Hello::ID = 0;
static RegisterPass<Hello> X("hello", "Discover annotation attribute",
false /* Only looks at CFG */,
false /* Analysis Pass */);
Use runOnModule() instead of runOnFunction() if you are doing so. Alternatively, you can take the module. llvm.global.annotations is defined outside functions. Inside do something like:
for (Module::global_iterator I = F.global_begin(), E = F.global_end(); I != E; ++I) {
if (I->getName() == "llvm.global.annotations")
{
errs()<<"\nllvm.global.annotations\n";
//1. find out what global variable is by "parsing" the IR
//2. get through the module till you find a load #f
//3. you can add metadata to the load function and you can easily get the metadata from the normal pass
}
}
I solved it.
I cast the entire annotated expression to Value*. Then, in order to avoid ugly things like getAsString(), I check if V->getValueID() == Value::ConstantArrayVal in order to cast it to ConstantArray. Because it contains only array[0], I cast array0>getOperand(0) to ConstantStruct. Therefore, from ConstantStruct you can get all the four operands. Now to do is only to get the names of #f, #str from every field. This is done by ConstantStruct->getOperand(0)->getOperand(0).

Find values in a basicblock,which are computed in previous basicblocks

In a basicblock I wants to find all the values used in instructions, That are not computed in the same basicblock.
Example,
for.body5:
%i.015 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
%add1 = add nsw i32 %2, %i.015
%arrayidx = getelementptr inbounds [100 x i32]* %b, i32 0, i32 %i.015
store i32 %add1, i32* %arrayidx, align 4, !tbaa !0
%arrayidx2 = getelementptr inbounds [100 x i32]* %a, i32 0, i32 %i.015
store i32 %add1, i32* %arrayidx2, align 4, !tbaa !0
%inc = add nsw i32 %i.015, 1
%cmp = icmp slt i32 %inc, %3
br i1 %cmp, label %for.body, label %for.cond3.preheader
In above example i should get,
%2
%b
%a
%3
Which are declared and/or assigned in other basicblocks.
Please Suggest me a method.
Thanks in advance.
Hi I havent tested this out, but I would do something like this:
vector<Value*> values;
BasicBlock::iterator it;
User::op_iterator it;
// Iterate over all of the instructions in the Block
for (it=block->begin(); it++; it != block->end()){
// Iterate over the operands used by an instruction. 'op_begin' Defined in llvm::User class.
for (operand_it=it->op_begin(); operand_it++; operand_it != it->op_end() ){
// Could this if else statement be reduced?
// If this operand is an argument it was not defined in the block.
if (isa<Argument>(operand_it)){
values.push_back(operand_it);
}
// Otherwize, it could be a constant value or ...
else if (!isa<Instruction>(operand_it)){
continue;
}
// Check if the parent of the instruction is not the block in question.
else if (((Instruction*)operand_it)->getParent() != block){
values.push_back(operand_it);
}
}
}