Compiler error in ndk and clang++ for ARM? - c++

Please consider following code:
float test(int len, int* tab)
{
for(int i = 0; i<len; i++)
tab[i] = i;
}
Obviously return is missing. For this scenario for both clang and ndk compiler for ARM processor an infinite loop is generated. After disassembling it becomes clear that compiler generates regular branch instruction instead of conditional branch.
mov r0, #0
.LBB0_1:
str r0, [r1, r0, lsl #2]
add r0, r0, #1
b .LBB0_1
The example with an error can be found here: https://godbolt.org/z/YDSFw-
Please note that c++ specification states that missing return is considered as undefined behaviour but it refers only to the returned value. It shall not affect the preceding instructions.
Am I missing something here? Any thoughts?

No, you can't reason that way with undefined behaviour.
The compiler is free to use undefined behaviour and assumptions around it for optimizations. The compiler is free to assume your code will not contain undefined behaviour.
In this case, the compiler can assume that the code with undefined behaviour won't be reached. As the end of the function contains undefined behaviour, the compiler concludes that the end of the function actually never will be reached, and thus can optimize the loop.
If you remove the -Oz and add -emit-llvm to the compiler explorer command, you'll see what LLVM IR clang produces originally, when not doing optimizations:
https://godbolt.org/z/-dbeNj
define dso_local float #_Z4testiPi(i32 %0, i32* %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32*, align 4
%5 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32* %1, i32** %4, align 4
store i32 0, i32* %5, align 4
br label %6
6: ; preds = %15, %2
%7 = load i32, i32* %5, align 4
%8 = load i32, i32* %3, align 4
%9 = icmp slt i32 %7, %8
br i1 %9, label %10, label %18
10: ; preds = %6
%11 = load i32, i32* %5, align 4
%12 = load i32*, i32** %4, align 4
%13 = load i32, i32* %5, align 4
%14 = getelementptr inbounds i32, i32* %12, i32 %13
store i32 %11, i32* %14, align 4
br label %15
15: ; preds = %10
%16 = load i32, i32* %5, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %5, align 4
br label %6
18: ; preds = %6
call void #llvm.trap()
unreachable
}
The end of the loop, label 18, contains unreachable. This can be used for further optimizations, getting rid of the branch and comparison at the start of the loop.
Edit:
There's an excellent blog post from John Regehr about how to reason around undefined behaviour in C and C++. It's a bit long but well worth a read.

Related

Does loop operation with variable assignment violate SSA principle?

I just started to learn LLVM IR and SSA, got a question about the SSA principle.
I found the following code block on the Internet, which seems to violate SSA principle because variables are assigned value for several times. Is my comprehension right?
; <label>:4: ; preds = %7, %0
%5 = load i32, i32* %3, align 4
%6 = icmp slt i32 %5, 10
br i1 %6, label %7, label %12
; <label>:7: ; preds = %4
%8 = load i32, i32* %3, align 4
%9 = add nsw i32 %8, 1
store i32 %9, i32* %3, align 4
%10 = load i32, i32* %2, align 4
%11 = mul nsw i32 %10, 2
store i32 %11, i32* %2, align 4
br label %4
LLVM uses "partial SSA" form. LLVM's infinite registers are in SSA form but memory and global variables are not. Your %5 can take on different values because it is a load from memory.
Even in fully SSA form an SSA value in a loop ordinarily takes on different values through the loop iterations. It would look like %5 = phi i32 [%start_val, %loopheader_bb], [%iteration_val, %backedge_bb]. You should get phi nodes if you run opt -sroa over your code.

What's the instruction for '&&' in LLVM IR?

I want to write an LLVM pass to reduce && in LLVM IR, but I can't find the specific instructions for it in IR. For example,
#include <iostream>
int main(){
bool a = true;
bool b = false;
bool c = a && b;
return 0;
}
and I get the IR,
define dso_local i32 #main() #4 {
%1 = alloca i32, align 4
%2 = alloca i8, align 1
%3 = alloca i8, align 1
%4 = alloca i8, align 1
store i32 0, i32* %1, align 4
store i8 1, i8* %2, align 1
store i8 0, i8* %3, align 1
%5 = load i8, i8* %2, align 1
%6 = trunc i8 %5 to i1
br i1 %6, label %7, label %10
7: ; preds = %0
%8 = load i8, i8* %3, align 1
%9 = trunc i8 %8 to i1
br label %10
10: ; preds = %7, %0
%11 = phi i1 [ false, %0 ], [ %9, %7 ]
%12 = zext i1 %11 to i8
store i8 %12, i8* %4, align 1
ret i32 0
}
but I tried this one,
#include <iostream>
int main(){
int a = 10;
int b = 10;
int c;
c = a && b;
return 0;
}
and I get this
define dso_local i32 #main() #4 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 10, i32* %2, align 4
store i32 10, i32* %3, align 4
%5 = load i32, i32* %2, align 4
%6 = icmp ne i32 %5, 0
br i1 %6, label %7, label %10
7: ; preds = %0
%8 = load i32, i32* %3, align 4
%9 = icmp ne i32 %8, 0
br label %10
10: ; preds = %7, %0
%11 = phi i1 [ false, %0 ], [ %9, %7 ]
%12 = zext i1 %11 to i32
store i32 %12, i32* %4, align 4
ret i32 0
}
I use LLVM 10 in ubuntu. I'll appreciate any answers or suggestions.
There is no LLVM instruction that specifically corresponds to the && operator. It can and will be translated in different ways depending on the expression and the optimization settings.
When you have optimizations enabled, the operands are side effect free (and not expensive to evaluate) and the whole expression can't be optimized away, clang will usually convert both operands to i1 and apply the logical and operator on them.
When optimizations are disabled or the operands have side effects, it'll usually be translated using branch instructions. That's the case in the two examples you posted.
Note that expr1 && expr2 is semantically equivalent to expr1 ? expr2 : false and you'll generally get the same LLVM code for both.
If you're okay with treating expr1 ? expr2 : false and other equivalent code (for example using if statements) the same as &&, you can try to detect the branching pattern created by them. If you need your pass to also be applicable after optimizations, you'll also have to detect at least the pattern of converting to i1 and anding.
If you only want your transformation to apply to && and nothing else, you simply can't do it at the LLVM level. You'd need an AST transformation at the Clang level.

getting block names for LLVM IR parser

I'm writing a LLVM parser to analyse whether a program is adhering to a certain programming paradigm. To that I need to analyse each block of the IR and check certain instructions. When I created the .ll file, I don't see the label names but an address:
; <label>:4 ; preds = %0
%5 = load i32* %c, align 4
%6 = add nsw i32 %5, 10
store i32 %6, i32* %c, align 4
br label %10
; <label>:7 ; preds = %0
%8 = load i32* %c, align 4
%9 = add nsw i32 %8, 15
store i32 %9, i32* %c, align 4
br label %10
; <label>:10 ; preds = %7, %4
%11 = load i32* %1
ret i32 %11
What I need is to get these "labels" into a list. I have also seen that some .ll files has following format:
if.then: ; preds = %entry
%5 = load i32* %c, align 4
%6 = add nsw i32 %5, 10
store i32 %6, i32* %c, align 4
br label %10
if.else: ; preds = %entry
%8 = load i32* %c, align 4
%9 = add nsw i32 %8, 15
store i32 %9, i32* %c, align 4
br label %10
if.end: ; preds = %if.else,
%11 = load i32* %1
ret i32 %11
With the 2nd format, I can use the getName() to get the name of the block: i.e: 'if.then', 'if.else' etc.
But with the 1st format, it's impossible as it doesn't have a name. But I tested with printAsOperand(errs(), true) from which I can print the addresses like: '%4, %7 %10'. What my question is, how to add these addresses (or operands) into a stings list? or obtain these values and assign to a certain variable.
Here's the way to do it;
raw_ostream should be used in printAsOperand() method to get the required address into a variable:
following is the method I used for the purpose:
#include "llvm/Support/raw_ostream.h"
std::string get_block_reference(BasicBlock *BB){
std::string block_address;
raw_string_ostream string_stream(block_address);
BB->printAsOperand(string_stream, false);
return string_stream.str();
}
Instruction / basic block names is a debugging feature that simplifies the development of IR-level passes, but no guarantees are made towards them. E.g. they could be simply stripped off, they could be misleading, etc. You should not rely on them for anything meaningful (and in general they may not have any connection to the original source code). Normally the names are no generated in Release builds of LLVM. You need to build everything in Debug (or Release+Assertions) mode.

About Variables Used Within BasicBlock

I want to ask a question about LLVM IR language. For a basicblock, variables used are always loaded prior to usage, and stored after usage. Two example basic blocks are as follows:
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
%i = alloca i32, align 4
%fact = alloca i32, align 4
%n = alloca i32, align 4
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5, align 8
%7 = call i32 (i8*, ...)* bitcast (i32 (...)* #atoi to i32 (i8*, ...)*)(i8* %6)
store i32 %7, i32* %n, align 4
store i32 1, i32* %fact, align 4
store i32 1, i32* %i, align 4
br label %8
%9 = load i32* %i, align 4
%10 = load i32* %n, align 4
%11 = icmp sle i32 %9, %10
br i1 %11, label %12, label %19
For control flow, define first basic block as A, second basic block as B, control flow is from A to B.
I wonder, for the usage of %7, program store %7 to %n pointer in A, and load %n pointer to %10 to get access to it, which are like:
store i32 %7, i32* %n, align 4
%10 = load i32* %n, align 4
%11 = icmp sle i32 %9, %10
I wonder if I could just DROP store and load instructions, and directly use value %7, which is as follows:
%11 = icmp sle i32 %9, %7
Is this OK? Could anyone talk about the reason behind it?
My description may be obscure. I could explain it more clear if you have questions on it.
Thanks
It is possible to refer to virtual registers from other basic blocks.
Since you provided an incomplete example, I can just speculate if %7 can be directly used in the comparison:
If you optimize the code with LLVM's opt tool, the register will probably not be stored and reloaded and the comparison will directly use %7 (or a phi function dependent on the value).
You can try the mem2reg register pass:
opt -mem2reg <your file>.ll -o <target file>.ll

Why does LLVM require a new temp register identifier

I added some new tests to the compiler which generated the following but gives an expected %4 error.
; Entry Point
define i32 #main(i32 %argc, i8** %argv) {
entry:
%argc_addr = alloca i32
%argv_addr = alloca i8**
%retval = alloca i32
%0 = alloca i32
store i32 %argc, i32* %argc_addr
store i8** %argv, i8*** %argv_addr
%1 = load i32* %argc_addr
%2 = load i8*** %argv_addr
call void #__llvmsharp_init(i32 %1, i8** %2)
call i32 #__LS19ConsoleApplication37Program_mt_4Main()
store i32 0, i32* %0, align 4
%3 = load i32* %0, align 4
// error expected %4
store i32 %3, i32* %retval
br label %return
return:
%retval1 = load i32* %retval
ret i32 %retval1
}
Also is it wise to use unnamed temporaries ..
%3, %4 etc are neither temporary nor registers - two concepts that do not exist in LLVM IR - instead, they are names of instructions. I suggest reading more about single static assignment form to understand how it works.
In the textual representation of LLVM IR, non-void instructions that don't have any name are allocated numeric names such as %3, %4, and whether allocated implicitly or explicitly in the code, those numbers must be sequential. The instruction call i32 #__LS19ConsoleApplication37Program_mt_4Main() is a non-void one so it is implicitly allocated a number - %3 - and so the next unnamed instruction, load i32* %0, align 4, should be given %4, not %3.
If you write LLVM IR by hand and have trouble with instruction naming you might consider using my LLVM IR editor plugin for Eclipse, it will mark such errors for you and will offer to replace any wrong numbers with the correct ones:
(notice how the unnamed add i32 %1, 1 was implicitly allocated %2)