I'm splitting all Basic Blocks with minimum number of instructions (usually 3-5):
llvm::SplitBlock(BasicBlock, &*BasicBlockiter, Pass);
and trying to get object file from IR
llc -filetype=obj 2.ll
I got the following errors:
Instruction does not dominate all uses!
%1 = alloca i32
%mul = load i32* %1
Instruction does not dominate all uses!
%1 = alloca i32
%99 = load i32* %1
and
While deleting: i32 %
Use still stuck around after Def is destroyed: %var = alloca i32
Assertion failed: use_empty() && "Uses remain when a value is destroyed!"
and
error: expected instruction opcode
invoke.cont2: ; preds = %main_block, %invoke
.cont
IR:
invoke.cont2: ; preds = %main_block, %invoke.cont
%call4 = invoke i32 #_ZStorSt13_Ios_OpenmodeS_(i32 8, i32 16)
to label %invoke.cont3 unwind label %lpad1
store i32 %call4, i32* %var4
I think that after splitting, instructions are located in different basic blocks.
If I split the block into 10-15 instructions, all is OK.
How can I predict/check and avoid this errors?
In your first version, you had instruction after a terminator instruction, which was incorrect since this instruction is never executed.
In your second version (not mentioned here, please use stackoverflow instead of private emails...) are using %call (in the store inst) before defining it (%call = ...), so clearly your definition does not precede every use...
But as I said, the store should not be after the invoke, because invoke is a terminatorinst.
The solution is to put your store in the next basic block (you can create a new one) :
%invoke.cont
%call = invoke i8* #_ZNKSs5c_strEv(%"class.std::basic_string"* #loadedFile)
to label %invoke.cont2_before unwind label %lpad1
invoke.cont2_before: ; preds = %invoke.cont
store i8* %call, i8** %reduced_var
br label %invoke.cont2
invoke.cont2: ; preds = %main_block, %invoke.cont2_before
%call4 = invoke i32 #_ZStorSt13_Ios_OpenmodeS_(i32 8, i32 16)
to label %invoke.cont3_before unwind label %lpad1
etc...
Related
I'm learning LLVM these days via observing how clang deal with complex situations. I wrote (top level, not in a function):
int qaq = 666;
int tat = 233;
auto hh = qaq + tat;
And I use the command:
clang-4.0 003.cpp -emit-llvm -S -std=c++11
And clang generates codes like this:
#qaq = global i32 666, align 4
#tat = global i32 233, align 4
#hh = global i32 0, align 4
#llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* #_GLOBAL__sub_I_003.cpp, i8* null }]
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init() #0 section ".text.startup" {
%1 = load i32, i32* #qaq, align 4
%2 = load i32, i32* #tat, align 4
%3 = add nsw i32 %1, %2
store i32 %3, i32* #hh, align 4
ret void
}
; Function Attrs: noinline uwtable
define internal void #_GLOBAL__sub_I_003.cpp() #0 section ".text.startup" {
call void #__cxx_global_var_init()
ret void
}
I'm confused with _GLOBAL__sub_I_003.cpp: why does clang generate a function that actually only invoke another function (and not doing anything else)? Even both of them have no parameters?
Disclaimer: This is my interpretation of the logic, I'm not part of the LLVM team.
In order to understand the reasoning behind this, you have to understand a fundamental concept in software engineering: Complexity creates bugs, and makes testing harder.
But first, let's make your example a little more interesting:
int qaq = 666;
int tat = 233;
auto hh = qaq + tat;
auto ii = qaq - tat;
Which leads to:
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init() #0 section ".text.startup" !dbg !16 {
%1 = load i32, i32* #qaq, align 4, !dbg !19
%2 = load i32, i32* #tat, align 4, !dbg !20
%3 = add nsw i32 %1, %2, !dbg !21
store i32 %3, i32* #hh, align 4, !dbg !21
ret void, !dbg !20
}
; Function Attrs: noinline uwtable
define internal void #__cxx_global_var_init.1() #0 section ".text.startup" !dbg !22 {
%1 = load i32, i32* #qaq, align 4, !dbg !23
%2 = load i32, i32* #tat, align 4, !dbg !24
%3 = sub nsw i32 %1, %2, !dbg !25
store i32 %3, i32* #ii, align 4, !dbg !25
ret void, !dbg !24
}
; Function Attrs: noinline uwtable
define internal void #_GLOBAL__sub_I_example.cpp() #0 section ".text.startup" !dbg !26 {
call void #__cxx_global_var_init(), !dbg !28
call void #__cxx_global_var_init.1(), !dbg !29
ret void
}
So we see that CLANG emits a single function for each non-trivial initialization, and calls each of them one after the other in _GLOBAL__sub_I_example.cpp(). That makes sense and is sensible, as things are neatly organized this way, and could become a garbled mess in larger/more complicated files otherwise.
Notice how that's the exact same logic that is being applied in your example.
Doing otherwise would imply an algorithm of the type: "if there is a single non-trivial global initialization, then put the code directly in the translation unit's global constructor".
Note the following:
The current logic handles that case correctly already.
In optimized code, the end result would be the exact same.
So what would that logic get us, really?
More branches to test.
More opportunities to accidentaly insert a bug.
More code to maintain in the long run.
Removal of a single function call in the global initialization of some translation units in non-optimized builds.
Keeping things the way they are is just the right decision.
There is a branch in ir that I want to delete completely(condtion + branch + true_basic_block + false_basic_block). It looks like this:
%4 = icmp sge i32 %2, %3
br i1 %4, label %5, label %7
; <label>:5 ; preds = %0
%6 = load i32* %x, align 4
store i32 %6, i32* %z, align 4
br label %9
; <label>:7 ; preds = %0
%8 = load i32* %y, align 4
store i32 %8, i32* %z, align 4
br label %9
; <label>:9 ; preds = %7, %5
%10 = call dereferenceable(140) %"class.std::basic_ostream"*#_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(%"class.std::basic_ostream"* dereferenceable(140) #_ZSt4cout, i8* getelementptr inbounds ([5 x i8]* #.str, i32 0, i32 0))
%11 = load i32* %z, align 4
%12 = call dereferenceable(140) %"class.std::basic_ostream"* #_ZNSolsEi(%"class.std::basic_ostream"* %10, i32 %11)
%13 = call dereferenceable(140) %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"* %12, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)* #_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_)
ret i32 0
Now to delete it , is there a removeBranch function , or do I need to delete instructions one by one. I have been trying the latter way but I have seen every error from "Basic block in main does not have an terminator" to "use remains when def is destroyed", and many more.. I have used erasefromparent, replaceinstwithvalue, replaceinstwithinst, removefromparent, etc.
Can anyone be kind enough to point me in the correct direction?
This is my function_pass :
bool runOnFunction(Function &F) override {
for (auto& B : F)
for (auto& I : B)
if(auto* brn = dyn_cast<BranchInst>(&I))
if(brn->isConditional()){
Instruction* cond = dyn_cast<Instruction>(brn->getCondition());
if(cond->getOpcode() == Instruction::ICmp){
branch_vector.push_back(brn);
//removeConditionalBranch(dyn_cast<BranchInst>(brn));
}
}
/*For now just delete the branches in the vector.*/
for(auto b : branch_vector)
removeConditionalBranch(dyn_cast<BranchInst>(b));
return true;
}
This is the output :
I don't know of any RemoveBranch utility function, but something like this should work. The idea is to delete the branch instruction, then delete anything that becomes dead as a result, and then merge the initial block with the join block.
// for DeleteDeadBlock, MergeBlockIntoPredecessor
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
// for RecursivelyDeleteTriviallyDeadInstructions
#include "llvm/Transforms/Utils/Local.h"
void removeConditionalBranch(BranchInst *Branch) {
assert(Branch &&
Branch->isConditional() &&
Branch->getNumSuccessors() == 2);
BasicBlock *Parent = Branch->getParent();
BasicBlock *ThenBlock = Branch->getSuccessor(0);
BasicBlock *ElseBlock = Branch->getSuccessor(1);
BasicBlock *ThenSuccessor = ThenBlock->getUniqueSuccessor();
BasicBlock *ElseSuccessor = ElseBlock->getUniqueSuccessor();
assert(ThenSuccessor && ElseSuccessor && ThenSuccessor == ElseSuccessor);
Branch->eraseFromParent();
RecursivelyDeleteTriviallyDeadInstructions(Branch->getCondition());
DeleteDeadBlock(ThenBlock);
DeleteDeadBlock(ElseBlock);
IRBuilder<> Builder(Parent);
Builder.CreateBr(ThenSuccessor);
bool Merged = MergeBlockIntoPredecessor(ThenSuccessor);
assert(Merged);
}
This code only handles the simple case you've shown, with the then and else blocks both jumping unconditionally to a common join block (it will fail with an assertion error for anything more complicated). More complicated control flow will be a bit trickier to handle, but you should still be able to use this code as a starting point.
When looking at the LLVM IR that the julia compiler generates (using code_llvm) I noticed something strange in the function signature when using arrays as arguments. Let me give an example:
function test(a,b,c)
return nothing
end
(This is a useless example, but the results are the same with other functions, the resulting IR of this example is just less cluttered)
Using code_llvm(test, (Int,Int,Int)), I get the following output:
; Function Attrs: sspreq
define void #julia_test14855(i64, i64, i64) #2 {
top:
ret void, !dbg !366
}
Using code_llvm(test, (Array{Int},Array{Int},Array{Int})), I get an (at least for me) unexpected result:
; Function Attrs: sspreq
define %jl_value_t* #julia_test14856(%jl_value_t*, %jl_value_t**, i32) #2 {
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
ifcont: ; preds = %top
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
ret %jl_value_t* %4, !dbg !370
}
Why is the signature of the llvm function not just listing the 3 variables as i64* or something like that? And why doesn't the function return void anymore?
Why is the signature of the llvm function not just listing the 3 variables as i64*
This signature is the generic Julia calling convention (because, as #ivarne mentioned, the types are incomplete).
#julia_test14856(%jl_value_t*, %jl_value_t**, i32) arguments are:
pointer to the function closure
pointers to boxed arguments (jl_value_t is basic box type)
number of arguments
The signature #ivarne shows is the specialized calling convention. Arguments are still passed boxed, but argument type and count are known already (and the function closure is unnecessary because it is already specialized).
About the output of your example function, this section checks the number of arguments (if not 3 -> goto label else:):
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
This section returns the error:
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
Finally, the default case goes to this line which pulls the value for nothing stored in address 36005472 (in #ivarne version, this is guaranteed, so can return void directly).
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
I would assume that it is because Array{Int, N} is a partially initialized type, and that it does not match the patterns the code generation looks for.
Try also
julia> code_llvm(test, (Array{Int,1},Array{Int,1},Array{Int,1}))
define void #julia_test15626(%jl_value_t*, %jl_value_t*, %jl_value_t*) {
top:
ret void, !dbg !974
}
This might be considered a bug in the code generation, but I do not know.
My goal is to do something simple in LLVM. I want to, using the C library function getchar, define an LLVM function that reads an input from the commandline. Here is my algorithm in pseudocode:
getInt:
get a character, set the value to VAL
check if VAL is '-'
if yes then set SGN to -1 and set VAL to the next character else set SGN to 1
set NV = to the next char minus 48
while (NV >= 0) // 48 is the first ASCII character that represents a number
set VAL = VAL*10
set VAL = VAL + NV
set NV to the next char minus 48
return SGN*VAL
So now, the LLVM code I come up with for doing this is in my head the most straightforward way to translate the above into LLVM IR. However, I get the error
"PHI nodes not grouped at the top of the basic block." If I move some things around to fix this error, I get errors about dominance. Below is the LLVM IR code that gives me the PHI nodes error. I believe I am misunderstanding something basic about LLVM IR, so any help you can give is super appreciated.
define i32 #getIntLoop() {
_L1:
%0 = call i32 #getchar()
%1 = phi i32 [ %0, %_L1 ], [ %3, %_L2 ], [ %8, %_L4 ]
%2 = icmp eq i32 %1, 45
br i1 %2, label %_L2, label %_L5
_L2: ; preds = %_L1
%3 = call i32 #getchar()
br label %_L3
_L3: ; preds = %_L4, %_L2
%4 = call i32 #getchar()
%5 = icmp slt i32 %4, 40
br i1 %5, label %_L5, label %_L4
_L4: ; preds = %_L3
%6 = sub i32 %4, 48
%7 = mul i32 %1, 10
%8 = add i32 %6, %7
br label %_L3
_L5: ; preds = %_L3, %_L1
br i1 %2, label %_L6, label %_L7
_L6: ; preds = %_L5
%9 = mul i32 -1, %1
ret i32 %9
_L7: ; preds = %_L5
ret i32 %1
}
You're getting a very clear error, though. According to the LLVM IR language reference:
There must be no non-phi instructions between the start of a basic
block and the PHI instructions: i.e. PHI instructions must be first in
a basic block.
You have a phi in L1 which violates this.
Why does it have %_L1 as one of its sources? There are no jumps to %_L1 anywhere else. I think you should first understand how phi works, possibly by compiling small pieces of C code into LLVM IR with Clang and see what gets generated.
Put simply, a phi is needed to have consistency in SSA form while being able to assign one of several values into the same register. Make sure you read about SSA - it explains Phi node as well. And additional good resource is the LLVM tutorial which you should go through. In particular, part 5 covers Phis. As suggested above, running small pieces of C through Clang is a great way to understand how things work. This is in no way "hacky" - it's the scientific method! You read the theory, think hard about it, form hypotheses about how things work and then verify those hypotheses by running Clang and seeing what it generates for real-life control flow.
My pass in LLVM generates an IR like this:
%5 = icmp eq i32 %4, 0
%7 = or i1 %5, %5
...
Since the or instruction is actually not needed(dead code), I replaced all occurrences of %7 with %5. Now, the or instruction should get deleted. Can I call Dead Code Elimination pass of LLVM from my pass, or is there any method to remove that or instruction?
A solution that is more aligned with LLVM's design philosophy is, instead of doing the substitution in your pass, let InstCombine do the job. Then you will not need to worry about running DCE.
For example:
>cat foo.ll
define i32 #foo(i32 %a, i32 %b) #0 {
entry:
%or = or i32 %a, %a
ret i32 %or
}
> opt -S -instcombine < foo.ll
define i32 #foo(i32 %a, i32 %b) #0 {
entry:
ret i32 %a
}
Why don't you just schedule DCE to run after your pass in the pass manager. Let it do its analysis and decide what it wants to throw away.