I'm learning LLVM these days via observing how clang deal with complex situations. I wrote (top level, not in a function):
int qaq = 666;
int tat = 233;
auto hh = qaq + tat;
And I use the command:
clang-4.0 003.cpp -emit-llvm -S -std=c++11
And clang generates codes like this:
#qaq = global i32 666, align 4
#tat = global i32 233, align 4
#hh = global i32 0, align 4
#llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* #_GLOBAL__sub_I_003.cpp, i8* null }]
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init() #0 section ".text.startup" {
%1 = load i32, i32* #qaq, align 4
%2 = load i32, i32* #tat, align 4
%3 = add nsw i32 %1, %2
store i32 %3, i32* #hh, align 4
ret void
}
; Function Attrs: noinline uwtable
define internal void #_GLOBAL__sub_I_003.cpp() #0 section ".text.startup" {
call void #__cxx_global_var_init()
ret void
}
I'm confused with _GLOBAL__sub_I_003.cpp: why does clang generate a function that actually only invoke another function (and not doing anything else)? Even both of them have no parameters?
Disclaimer: This is my interpretation of the logic, I'm not part of the LLVM team.
In order to understand the reasoning behind this, you have to understand a fundamental concept in software engineering: Complexity creates bugs, and makes testing harder.
But first, let's make your example a little more interesting:
int qaq = 666;
int tat = 233;
auto hh = qaq + tat;
auto ii = qaq - tat;
Which leads to:
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init() #0 section ".text.startup" !dbg !16 {
%1 = load i32, i32* #qaq, align 4, !dbg !19
%2 = load i32, i32* #tat, align 4, !dbg !20
%3 = add nsw i32 %1, %2, !dbg !21
store i32 %3, i32* #hh, align 4, !dbg !21
ret void, !dbg !20
}
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init.1() #0 section ".text.startup" !dbg !22 {
%1 = load i32, i32* #qaq, align 4, !dbg !23
%2 = load i32, i32* #tat, align 4, !dbg !24
%3 = sub nsw i32 %1, %2, !dbg !25
store i32 %3, i32* #ii, align 4, !dbg !25
ret void, !dbg !24
}
; Function Attrs: noinline uwtable
define internal void #_GLOBAL__sub_I_example.cpp() #0 section ".text.startup" !dbg !26 {
call void #__cxx_global_var_init(), !dbg !28
call void #__cxx_global_var_init.1(), !dbg !29
ret void
}
So we see that CLANG emits a single function for each non-trivial initialization, and calls each of them one after the other in _GLOBAL__sub_I_example.cpp(). That makes sense and is sensible, as things are neatly organized this way, and could become a garbled mess in larger/more complicated files otherwise.
Notice how that's the exact same logic that is being applied in your example.
Doing otherwise would imply an algorithm of the type: "if there is a single non-trivial global initialization, then put the code directly in the translation unit's global constructor".
Note the following:
The current logic handles that case correctly already.
In optimized code, the end result would be the exact same.
So what would that logic get us, really?
More branches to test.
More opportunities to accidentaly insert a bug.
More code to maintain in the long run.
Removal of a single function call in the global initialization of some translation units in non-optimized builds.
Keeping things the way they are is just the right decision.
Related
I have LLVM IR like below :
for.body: ; preds = %for.cond
%add = add nsw i32 %i.0, 3
%idxprom = sext i32 %add to i64
%arrayidx = getelementptr inbounds i32, i32* %arr, i64 %idxprom
%0 = load i32, i32* %arrayidx, align 4
%add1 = add nsw i32 %sum1.0, %0
%add2 = add nsw i32 %i.0, 2
%idxprom3 = sext i32 %add2 to i64
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%1 = load i32, i32* %arrayidx4, align 4
%add5 = add nsw i32 %sum2.0, %1
%add6 = add nsw i32 %i.0, 1
%idxprom7 = sext i32 %add6 to i64
%arrayidx8 = getelementptr inbounds i32, i32* %arr, i64 %idxprom7
%2 = load i32, i32* %arrayidx8, align 4
%add9 = add nsw i32 %sum3.0, %2
%idxprom10 = sext i32 %i.0 to i64
%arrayidx11 = getelementptr inbounds i32, i32* %arr, i64 %idxprom10
%3 = load i32, i32* %arrayidx11, align 4
%add12 = add nsw i32 %sum4.0, %3
br label %for.inc
I want to re-arrang GEP instructions above. It should be arranged like below for this example :
%arrayidx11 = getelementptr inbounds i32, i32* %arr, i64 %idxprom10
%arrayidx8 = getelementptr inbounds i32, i32* %arr, i64 %idxprom7
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%arrayidx = getelementptr inbounds i32, i32* %arr, i64 %idxprom
I know that even the uses of array access has to be moved after this arrangement. So I am trying to get use-chain for each GEP instruction using below code :
// Get all the use chain instructions
for (Value::use_iterator i = inst1->use_begin(),e = inst1->use_end(); i!=e;++i) {
dyn_cast<Instruction>(*i)->dump();
}
But I am getting only the declaration instruction with this code, I was expecting to get all the below instructions for %arrayidx4 :
%arrayidx4 = getelementptr inbounds i32, i32* %arr, i64 %idxprom3
%1 = load i32, i32* %arrayidx4, align 4
Please help me out here. Thanks in advance.
I don't really like this question, but I should be doing paperwork for my taxes today...
Your first task is to find the GEPs and sort them into the order you want. When doing this, you need a separate list. LLVM's BasicBlock class does provide a list, but as a general rule, never modify that list while you're iterating over it. That's permitted but too error-prone.
So at the start:
std::vector<GetElementPtr *> geps;
for(auto & i : block->getInstList())
if(GetElementPtrInst * g = dyn_cast<GetElementPTrInst>(&i))
geps.push_back(g);
You can use any container class, your project's code standard will probably suggest using either std::whatever or an LLVM class.
Next, sort geps into the order you prefer. I leave that part out.
After that, move each GEP to the latest permissible point in the block. Which point is that? Well, if the block was valid, then each GEP is already after the values it uses and before the instructions that use it, so moving it to a possibly later point while keeping it before its users will do.
for(auto g : geps) {
Instruction * firstUser = nullptr;
for(auto u : g->users()) {
Instruction * i = dyn_cast<Instruction>(u);
if(i &&
i->getParent() == g->getParent() &&
(!firstUser ||
i->comesBefore(firstUser)))
firstUser = i;
}
}
if(firstUser)
g->moveBefore(firstUser);
}
For each user, check that it is an instruction within the same basic block, and if it is so, check whether it's earlier in the block than the other users seen so far. Finally, move the GEP.
You may prefer a different approach. Several are possible. For example, you could reorder the GEPs after sorting them (using moveAfter() to move each GEP after the previous one) and then use a combination of users() and moveAfter() to make sure all users are after the instructions they use.
for(auto u : foo->users))) {
Instruction * i = dyn_cast<Instruction>(u);
if(i &&
i->getParent() == foo->getParent() &&
i->comesBefore(foo))
i->moveAfter(foo);
}
Note again that this code never modifies the basic block's list while iterating over it. If you have any mysterious errors, check that first.
I've been trying to link IR generated with llvm's C++ api with a another IR file generated by Clang++. The input file to Clang is a function fn I'm trying to call from the first IR file. But llvm-link doesn't replace fn's declaration with its definition.
main_ir.ll
source_filename = "top"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
#0 = private unnamed_addr constant [5 x i8] c"%d \0A\00", align 1
declare i32 #printf(...)
declare i32 #fn(i32, ...)
define internal i32 #main() {
entrypoint:
%f_call = call i32 (i32, ...) #fn(i32 2)
%printfCall = call i32 (...) #printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* #0,
i32 0, i32 0), i32 %f_call)
br label %ProgramExit
ProgramExit: ; preds = %entrypoint
ret i32 0
}
fn_ir.ll (generated with Clang)
source_filename = "libDessin.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 #_Z2fni(i32) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = mul nsw i32 %3, 2
ret i32 %4
}
attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-
math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-
width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-
math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-
math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-
cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false"
"use-soft-float"="false" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 9.0.1-12 "}
And all llvm-link does is reproduce the contents of fn_ir.ll with the source_filename changed to llvm-link. I'd be real happy to know the bit I'm missing.
The answer is in the name mangling.
Your 'manually' generated IR has a function named fn, while clang++ emits the name _Z2fni.
You need to make the names match. Either emit the _Z2fni in the main_ir.ll, or (arguable better in this case) change the definition of fn in the fn_ir, e.g.:
extern "C" void fn(int x) {
return x * 2;
}
extern "C" tells the compiler to use C mangling convention, this is less fragile since it will work even if you change type or number of arguments of fn. However, it won't work if you want to pass C++ types into the fn, then you need to emit the right function name for the main_ir.ll.
UPD:
There two more 'discrepancies':
The fn has different arguments in the two modules: i32 vs i32, ...
The other issue is that main declared as internal. I guess it is just stripped since it is internal and it is not being called by anyone.
So just removing the internal flag should do the job for you.
How can I identify an annotated variable in an LLVM pass?
#include <stdio.h>
int main (){
int x __attribute__((annotate("my_var")))= 0;
int a,b;
x = x + 1;
a = 5;
b = 6;
x = x + a;
return x;
}
For example, I want to identify the instructions which have the annotated variable (x in this case) and print them out (x = x+1; and x = x+a)
How can I achieve this?
This is the .ll file generated using LLVM
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64"
#.str = private unnamed_addr constant [7 x i8] c"my_var\00", section "llvm.metadata"
#.str.1 = private unnamed_addr constant [7 x i8] c"test.c\00", section "llvm.metadata"
; Function Attrs: noinline nounwind optnone
define i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 0, i32* %1, align 4
%5 = bitcast i32* %2 to i8*
call void #llvm.var.annotation(i8* %5, i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
store i32 0, i32* %2, align 4
%6 = load i32, i32* %2, align 4
%7 = add nsw i32 %6, 1
store i32 %7, i32* %2, align 4
store i32 5, i32* %3, align 4
store i32 6, i32* %4, align 4
%8 = load i32, i32* %2, align 4
%9 = load i32, i32* %3, align 4
%10 = add nsw i32 %8, %9
store i32 %10, i32* %2, align 4
%11 = load i32, i32* %2, align 4
ret i32 %11
}
; Function Attrs: nounwind
declare void #llvm.var.annotation(i8*, i8*, i8*, i32) #1
attributes #0 = { noinline nounwind optnone "correctly-rounded-divide-sqrt-fp-math"="false" $
attributes #1 = { nounwind }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
I recently encountered similiary problem, as I searched Google still not found a solution.
But in the end , I found "ollvm" project's Utils.cpp ,it solved my problem.
In your case,
%5 = bitcast i32* %2 to i8*
call void #llvm.var.annotation(i8* %5, i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
as we can see there is a call to #llvm.var.annotation , in our pass ,
we can loop through instructions over a function , and search for "call" instruction.
Then get the called function's name:
Function *fn = callInst->getCalledFunction();
StringRef fn_name = fn->getName();
and compare the called function's name with "llvm.var.annotation" .
If they match ,then we found the location of "int x " in your case .
The function "llvm.var.annotation" is documented in llvm's doc :
http://llvm.org/docs/LangRef.html#llvm-var-annotation-intrinsic
If you have learn the function "llvm.var.annotation"'s prototype,
then you know that it's second argument is a pointer ,the pointer
points to "my_var\00" in your case . If you thought you can simply
convert it to a GlobalVariable ,then you will failed to get what
you wanted . The actual second argument passed to "llvm.var.annotation"
is
i8* getelementptr inbounds ([7 x i8], [7 x i8]* #.s$
in your case.
It's a expression but a GlobalVariable !!! By knowing this , we can
finally get the annotation of our target variable by :
ConstantExpr *ce =
cast<ConstantExpr>(callInst->getOperand(1));
if (ce) {
if (ce->getOpcode() == Instruction::GetElementPtr) {
if (GlobalVariable *annoteStr =
dyn_cast<GlobalVariable>(ce->getOperand(0))) {
if (ConstantDataSequential *data =
dyn_cast<ConstantDataSequential>(
annoteStr->getInitializer())) {
if (data->isString()) {
errs() << "Found data " << data->getAsString();
}
}
}
}
Hope you already solved the problem .
Have a nice day .
You have to loop on instructions and identify calls to llvm.var.annotation
First argument is a pointer to the annotated variable (i8*).
To get the actual annotated variable, you then need to find what this pointer points to.
In your case, this is the source operand of the bitcast instruction.
When looking at the LLVM IR that the julia compiler generates (using code_llvm) I noticed something strange in the function signature when using arrays as arguments. Let me give an example:
function test(a,b,c)
return nothing
end
(This is a useless example, but the results are the same with other functions, the resulting IR of this example is just less cluttered)
Using code_llvm(test, (Int,Int,Int)), I get the following output:
; Function Attrs: sspreq
define void #julia_test14855(i64, i64, i64) #2 {
top:
ret void, !dbg !366
}
Using code_llvm(test, (Array{Int},Array{Int},Array{Int})), I get an (at least for me) unexpected result:
; Function Attrs: sspreq
define %jl_value_t* #julia_test14856(%jl_value_t*, %jl_value_t**, i32) #2 {
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
ifcont: ; preds = %top
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
ret %jl_value_t* %4, !dbg !370
}
Why is the signature of the llvm function not just listing the 3 variables as i64* or something like that? And why doesn't the function return void anymore?
Why is the signature of the llvm function not just listing the 3 variables as i64*
This signature is the generic Julia calling convention (because, as #ivarne mentioned, the types are incomplete).
#julia_test14856(%jl_value_t*, %jl_value_t**, i32) arguments are:
pointer to the function closure
pointers to boxed arguments (jl_value_t is basic box type)
number of arguments
The signature #ivarne shows is the specialized calling convention. Arguments are still passed boxed, but argument type and count are known already (and the function closure is unnecessary because it is already specialized).
About the output of your example function, this section checks the number of arguments (if not 3 -> goto label else:):
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
This section returns the error:
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
Finally, the default case goes to this line which pulls the value for nothing stored in address 36005472 (in #ivarne version, this is guaranteed, so can return void directly).
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
I would assume that it is because Array{Int, N} is a partially initialized type, and that it does not match the patterns the code generation looks for.
Try also
julia> code_llvm(test, (Array{Int,1},Array{Int,1},Array{Int,1}))
define void #julia_test15626(%jl_value_t*, %jl_value_t*, %jl_value_t*) {
top:
ret void, !dbg !974
}
This might be considered a bug in the code generation, but I do not know.
I have an external (C) function that I am calling in my LLVM IR. The IR gets JITed and everything works fine, but the generated code is performance sensitive, and I want to remove duplicate calls to my external function if possible. The function has no side effects. Is there a FunctionPass that eliminates redundant calls to the function? Is there something I have to do to mark the function as having no side effects?
Thanks!
According to http://llvm.org/docs/LangRef.html#function-attributes you can specify the attributes readonly or readnone for a function:
declare i32 #fn(i32 %i);
declare i32 #readonly_fn(i32 %i) readonly;
declare i32 #readnone_fn(i32 %i) readnone;
readonly means that the function doesn't write memory,
readnone means that it doesn't even read memory (for example sin() could be readnone)
If a function doesn't write memory, it should return the result only based on the parameters, and therefor be a pure function (if the global state doesn't change). In case of a readnone function, even the global state could change.
The llvm optimizer can optimize calls to readonly and readnone functions with the EarlyCSE pass (common subexpression elimination), as shown in the following example:
using the following test functions
define i32 #test_no_readonly()
{
%1 = call i32 #fn(i32 0)
%2 = call i32 #fn(i32 0)
%add = add i32 %1, %2
ret i32 %add
}
define i32 #test_readonly()
{
%1 = call i32 #readonly_fn(i32 0)
%2 = call i32 #readonly_fn(i32 0)
%add = add i32 %1, %2
ret i32 %add
}
define i32 #test_readnone()
{
%1 = call i32 #readnone_fn(i32 0)
%2 = call i32 #readnone_fn(i32 0)
%add = add i32 %1, %2
ret i32 %add
}
and running opt -early-cse -S readonly_fn.ll > readonly_fn_opt.ll optimizes away the second call for the readonly and readnone functions, resulting in
define i32 #test_no_readonly() {
%1 = call i32 #fn(i32 0)
%2 = call i32 #fn(i32 0)
%add = add i32 %1, %2
ret i32 %add
}
define i32 #test_readonly() {
%1 = call i32 #readonly_fn(i32 0)
%add = add i32 %1, %1
ret i32 %add
}
define i32 #test_readnone() {
%1 = call i32 #readnone_fn(i32 0)
%add = add i32 %1, %1
ret i32 %add
}
The readonly_fn and readnone_fn functions are only called once, thus eleminating redundand calls.
The -functionattrs pass can also add these attributes to defined functions