Expected top-level entity - llvm

How did you managed to pass through expected top-level entity error while executing lli in the llvm framework?

This error usually means that you copy-pasted part of some IR code which doesn't count as a top level entity. In other words, it's not a function, not a type, not a global variable, etc. The same error can happen in C, just for comparison:
x = 8;
Is not valid contents for a C file, because the assignment statement isn't a valid top level entity. To make it valid you put it in a function:
void foo() {
x = 8; /* assuming x is global and visible here */
}
The same error happens in LLVM IR.

My Issue: The .ll file format was "UTF-8 with BOM" instead of "UTF-8 without BOM".
Fix: With notepad++, in the encoding menu, select the "UTF-8 without BOM", then save.
Quick setup: (For llvm 3.4.0 .ll files on windows)
advanced text editor from https://notepad-plus-plus.org/
llvm binaries from https://github.com/CRogers/LLVM-Windows-Binaries
hello.ll as "UTF-8 without BOM" (This code is in llvm 3.4.0 format):
#msg = internal constant [13 x i8] c"Hello World!\00"
declare i32 #puts(i8*)
define i32 #main() {
call i32 #puts(i8* getelementptr inbounds ([13 x i8]* #msg, i32 0, i32 0))
ret i32 0
}
In command prompt:
lli hello.ll
Quick setup: (For llvm 3.8.0 .ll files on windows)
advanced text editor from https://notepad-plus-plus.org/
clang binaries from: http://llvm.org/releases/download.html#3.8.0
hello.ll as "UTF-8 without BOM" (This code is in llvm 3.8.0 format):
#msg = internal constant [13 x i8] c"Hello World!\00"
declare i32 #puts(i8*)
define i32 #main() {
call i32 #puts(i8* getelementptr inbounds ([13 x i8], [13 x i8]* #msg, i32 0, i32 0))
ret i32 0
}
In command prompt:
clang hello.ll -o hello.exe
hello.exe
Errors about char16_t, u16String, etc means clang needs: -fms-compatibility-version=19

Related

Julia llvm function signature when using arrays

When looking at the LLVM IR that the julia compiler generates (using code_llvm) I noticed something strange in the function signature when using arrays as arguments. Let me give an example:
function test(a,b,c)
return nothing
end
(This is a useless example, but the results are the same with other functions, the resulting IR of this example is just less cluttered)
Using code_llvm(test, (Int,Int,Int)), I get the following output:
; Function Attrs: sspreq
define void #julia_test14855(i64, i64, i64) #2 {
top:
ret void, !dbg !366
}
Using code_llvm(test, (Array{Int},Array{Int},Array{Int})), I get an (at least for me) unexpected result:
; Function Attrs: sspreq
define %jl_value_t* #julia_test14856(%jl_value_t*, %jl_value_t**, i32) #2 {
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
ifcont: ; preds = %top
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
ret %jl_value_t* %4, !dbg !370
}
Why is the signature of the llvm function not just listing the 3 variables as i64* or something like that? And why doesn't the function return void anymore?
Why is the signature of the llvm function not just listing the 3 variables as i64*
This signature is the generic Julia calling convention (because, as #ivarne mentioned, the types are incomplete).
#julia_test14856(%jl_value_t*, %jl_value_t**, i32) arguments are:
pointer to the function closure
pointers to boxed arguments (jl_value_t is basic box type)
number of arguments
The signature #ivarne shows is the specialized calling convention. Arguments are still passed boxed, but argument type and count are known already (and the function closure is unnecessary because it is already specialized).
About the output of your example function, this section checks the number of arguments (if not 3 -> goto label else:):
top:
%3 = icmp eq i32 %2, 3, !dbg !369
br i1 %3, label %ifcont, label %else, !dbg !369
This section returns the error:
else: ; preds = %top
call void #jl_error(i8* getelementptr inbounds ([26 x i8]* #_j_str0, i64 0, i64 0)), !dbg !369
unreachable, !dbg !369
Finally, the default case goes to this line which pulls the value for nothing stored in address 36005472 (in #ivarne version, this is guaranteed, so can return void directly).
%4 = load %jl_value_t** inttoptr (i64 36005472 to %jl_value_t**), align 32, !dbg !370
I would assume that it is because Array{Int, N} is a partially initialized type, and that it does not match the patterns the code generation looks for.
Try also
julia> code_llvm(test, (Array{Int,1},Array{Int,1},Array{Int,1}))
define void #julia_test15626(%jl_value_t*, %jl_value_t*, %jl_value_t*) {
top:
ret void, !dbg !974
}
This might be considered a bug in the code generation, but I do not know.

How should be used LLVM modules?

I'm using LLVM to convert a user-defined language into bytecode, and I'm not sure to understand how should be used a module.
At the beginning, I thought it was something like the C/C++ object files (to avoid bytecode recompilation of every files when a single file is edited). However, I have found this line into LLVMpy documentation, which seems to say that it is not the case :
Inter-module reference is not possible. That is module A cannot call a function in module B, directly.
Can someone explain why are modules separated from the contexts if we can't have multiple modules for a single context ?
It is possible, but like the .o files you mention, they must first be linked together into a single binary.
Given a pair of bitcode files:
$ llvm-dis a.bc -o -
; ModuleID = 'a.bc'
#0 = global [13 x i8] c"Hello world!\0A"
declare i32 #printf(i8*)
define void #f() {
%1 = call i32 #printf(i8* getelementptr inbounds ([13 x i8]* #0, i64 0, i64 0))
ret void
}
$ llvm-dis b.bc -o -
; ModuleID = 'b.bc'
declare void #f()
define i32 #main() {
call void #f()
ret i32 0
}
This won't work:
$ lli b.bc
LLVM ERROR: Program used external function 'f' which could not be resolved!
But if you link them together, it will:
$ llvm-ld a.bc b.bc -disable-opt -o c
$ llvm-dis c.bc -o -
; ModuleID = 'c.bc'
#0 = global [13 x i8] c"Hello world!\0A"
declare i32 #printf(i8*)
define void #f() {
%1 = call i32 #printf(i8* getelementptr inbounds ([13 x i8]* #0, i64 0, i64 0))
ret void
}
define i32 #main() {
call void #f()
ret i32 0
}
$ lli c.bc
Hello world!

What exactly is the LLVM C++ API

I found it hard to understand the LLVM C++ API.
Is there any relationship between LLVM C++ API and LLVM IR? Also, how could one use the LLVM C++ API?
To (greatly) simplify, LLVM is a C++ library for writing compilers. Its C++ API is the external interface users of the library employ to implement their compiler.
There's a degree of symmetry between LLVM IR and part of the LLVM C++ API - the part used to build IR. A very good resource for getting a feel for this symmetry is http://llvm.org/demo/. For example, you can compile this C code:
int factorial(int X) {
if (X == 0) return 1;
return X*factorial(X-1);
}
Into LLVM IR:
define i32 #factorial(i32 %X) nounwind uwtable readnone {
%1 = icmp eq i32 %X, 0
br i1 %1, label %tailrecurse._crit_edge, label %tailrecurse
tailrecurse: ; preds = %tailrecurse, %0
%X.tr2 = phi i32 [ %2, %tailrecurse ], [ %X, %0 ]
%accumulator.tr1 = phi i32 [ %3, %tailrecurse ], [ 1, %0 ]
%2 = add nsw i32 %X.tr2, -1
%3 = mul nsw i32 %X.tr2, %accumulator.tr1
%4 = icmp eq i32 %2, 0
br i1 %4, label %tailrecurse._crit_edge, label %tailrecurse
tailrecurse._crit_edge: ; preds = %tailrecurse, %0
%accumulator.tr.lcssa = phi i32 [ 1, %0 ], [ %3, %tailrecurse ]
ret i32 %accumulator.tr.lcssa
}
As well as to C++ API calls (I won't paste it here because the output is long, but you can try it yourself). Doing this, you'll see, for example the icmp instruction from the IR code above done as:
ICmpInst* int1_5 = new ICmpInst(*label_4, ICmpInst::ICMP_EQ, int32_X, const_int32_1, "");
ICmpInst is a class that's part of the C++ API used to create icmp instructions. A good reference for the C++ API is the Programmer's manual.
You can use the CPP backend (llc -march=cpp) to find out the mapping from any given IR to the C++ API.
UPDATE: the cpp backend is no longer available.

Remove an instruction through Dead Code Elimination pass of llvm

My pass in LLVM generates an IR like this:
%5 = icmp eq i32 %4, 0
%7 = or i1 %5, %5
...
Since the or instruction is actually not needed(dead code), I replaced all occurrences of %7 with %5. Now, the or instruction should get deleted. Can I call Dead Code Elimination pass of LLVM from my pass, or is there any method to remove that or instruction?
A solution that is more aligned with LLVM's design philosophy is, instead of doing the substitution in your pass, let InstCombine do the job. Then you will not need to worry about running DCE.
For example:
>cat foo.ll
define i32 #foo(i32 %a, i32 %b) #0 {
entry:
%or = or i32 %a, %a
ret i32 %or
}
> opt -S -instcombine < foo.ll
define i32 #foo(i32 %a, i32 %b) #0 {
entry:
ret i32 %a
}
Why don't you just schedule DCE to run after your pass in the pass manager. Let it do its analysis and decide what it wants to throw away.

LLVM not recognizing unnamed_addr

To test LLVM's functionality, I wrote the following simple program.
#include <stdio.h>
int main()
{
printf( "Hello World!\n" );
return 0;
}
And then compiled it to LLVM IR by typing clang -S -emit-llvm main.c -o main.ll. The generated code in main.ll was the following.
; ModuleID = 'main.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-pc-linux-gnu"
#.str = private unnamed_addr constant [14 x i8] c"Hello World!\0A\00"
define i32 #main() nounwind {
%1 = alloca i32, align 4
store i32 0, i32* %1
%2 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([14 x i8]* #.str, i32 0, i32 0))
ret i32 0
}
declare i32 #printf(i8*, ...)
Then when I tried to compile the IR code (in main.ll) to native executable binary, by typing llc main.ll -o main.s && gcc main.s -o main, I got the following error.
llc: main.ll:5:17: error: expected 'global' or 'constant'
#.str = private unnamed_addr constant [14 x i8] c"Hello World!\0A\00"
However, If I remove unnamed_addr from main.ll, it does get compiled. So my question is what is wrong with unnamed_addr. Why it is not compiling with it? Is this maybe because I'm using incompatible versions of clang and llvm?
The unnamed_addr attribute was introduced in LLVM 2.9.
Could it be that your clang is from 2.9 or newer, while your llc is from 2.8 or older?