How does `declare` in LLVM IR works - llvm

For example
#.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
declare i32 #puts(i8* nocapture) nounwind
define i32 #main() {
%cast210 = getelementptr [13 x i8], [13 x i8]* #.str, i64 0, i64 0
call i32 #puts(i8* %cast210)
ret i32 0
}
I don't understand where the function puts comes from. It seems to be a C function in stdio.h, but what does this have to do with LLVM? Where is its implementation?

Speaking in C/C++ terms, each translation unit can reference external symbols. For compiler it doesn't matter where symbol is actually defined as far as you have a declaration for it.
After compiling your .c files you get something that needs to be linked together (object files .o or LLVM IR .ll/.bc). On the linking stage all symbol definitions are resolved (in different ways).
In your example, the puts function is usually located in the system libc, which gets linked automatically by default. You won't find LLVM IR code for this function unless you somehow compiled whole libc into LLVM IR.
Read some general tutorial on the "compilation and linking" topic.

Related

global variable not found in llvm JIT symbol table

I am trying to get a llvm::Module with a global variable to compile with the KaleidoscopeJIT Compiler, however, I get an error in the symbol lookup from the JIT compiler. (KaleidoscopeJIT.h source code from https://github.com/llvm-mirror/llvm/blob/master/examples/Kaleidoscope/include/KaleidoscopeJIT.h )
Upon inspection of the Symbol Table in the LegacyRTDyldObjectLinkingLayerBase, I indeed see that the global variable has not been added to Symbol table. Is this because the global variable is uninitialized? If so how should I specify an initializer for a struct using the llvm C++ api?
I have an IR code generated that looks like this
ModuleID = 'my jit module'
source_filename = "my jit module"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
%g = type { double, double }
#r = external global %g
define double #b() {
entry_b:
%p = alloca %g, align 8
%0 = getelementptr %g, %g* %p, i32 0, i32 1
store double 1.170000e+02, double* %0, align 8
%1 = load %g, %g* %p, align 8
store %g %1, %g* #r, align 8
%2 = load double, double* %0, align 8
ret double %2
}
However, when the JIT compiler tries to compile the function "b", I get an error saying
Failure value returned from cantFail wrapped call
Symbols not found: [ _r ]
The error occurs when trying to compile the IR code line
store %g %1, %g* #r, align 8
as the JIT is not able to find the symbol corresponding to the global variable "r" in the symbol table of the JIT.
The problem seems to be that uninitialized global variables are somehow optimized out and not added to the symbol table.
A quick work around to ensure that the variable gets added to the symbol table is to initialize it with an "Undefined value".
The following code allows to do such an initialization with the c++ api
// code defining the struct type
std::vector<llvm::Type *> Members(2, llvm::Type::getDoubleTy(TheContext));
llvm::StructType *TypeG = llvm::StructType::create(TheContext,Members,"g",false);
// defining the global variable
TheModule->getOrInsertGlobal("r",TypeG);
llvm::GlobalVariable *gVar = TheModule->getNamedGlobal("r");
// initialize the variable with an undef value to ensure it is added to the symbol table
gVar->setInitializer(llvm::UndefValue::get(TypeG));
This solves the problem.

LLVM IR can't access global variable

I am trying to compile a program with LLVM and I produce this code
#c = common global i32
#d = common global i32
declare i32 #writeln(i32)
define i32 #a() {
entry:
store i32 2, i32* #c, align 4
ret i32 2
}
define i32 #main() {
entry:
%calltmp = call i32 #a()
ret i32 0
}
and i get this error when trying to compile it to object file
Undefined symbols for architecture x86_64:
"_c", referenced from:
_a in a.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Does anyone knows what i am doing wrong ?
To quote from the LLVM-IR Language Reference:
Global variable definitions must be initialized.
All global variable declarations define a pointer to a region of memory and all memory objects in LLVM are accessed through pointers.
This is relaxed if you are defining a pointer to an external value, for obvious reasons:
#G = external global i32

LLVM passes 0 as argument to external function call

Ok, maybe somebody can help me.
I am writing a small LLVM IR testprogram:
; ModuleID = 'main'
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-cygwin"
define i32 #my_main() {
entry:
%0 = alloca i64
store i64 42, i64* %0
%1 = load i64* %0
call void #put_integer(i32 15)
ret i32 0
}
declare void #put_integer(i32)
Actually it can be stripped down to this:
; ModuleID = 'main'
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-cygwin"
define i32 #my_main() {
entry:
call void #put_integer(i32 15)
ret i32 0
}
declare void #put_integer(i32)
Where _put_integer is an external program that I compile with gcc or clang (doesn't matter for the problem).
The external program is this:
#include <stdio.h>
void put_integer(int Value)
{
printf("%d", Value);
}
and I compile it like this:
clang -c -Wall -g source/put_integer.c -o object/put_integer.o
I also have a small c-main program which calls my IR program:
#include <stdio.h>
extern int my_main(void);
int main(int argc, char *args[])
{
printf("Calling Mainprogram\n\n");
int n_return_value = my_main();
printf("\n\nMainprogram Returned: %u\n", n_return_value);
return n_return_value;
}
which is compiled with the same arguments as above. I put both external object files into a library and then I assemble my LLVM IR program and link it with the two external C-functions in the following way:
llc -filetype=obj test.bc -o test.o
gcc -L ./../RuntimeSystem/ test.o -lmy_runtime -o test.exe
This works fine and the program starts and runs.
The problem is that the actual printf() call prints 0 instead of the 15 that I give as parameter to the IR call. I went into the created program with gdb and checked the stack frame inside my put_integer() function and sure enough it says that 0 is passed as parameter.
So right now there is the problem that somehow the parameter that I pass to the LLVM IR call is not handed to the external C function, instead 0 is handed.
Can anybody please tell me what I'm doing wrong?
Thank you
Edit:
based on a comment below I include the IRBuilder code here that creates the relevant part of my IR code above in the first block.
Constant *left = ConstantInt::get( getGlobalContext(), APInt( 32, 15 ) );
FunctionType *printf_type =
TypeBuilder<void( int ), false>::get( getGlobalContext() );
Function *func = cast<Function>( MODULE.getOrInsertFunction(
"put_integer", printf_type ) );
BUILDER.CreateCall(func,left );
It is my understanding that a proper call to a function in IR must include the function type as in this example which is from the LLVM reference manual:
call i32 (i8*, ...)* #printf(i8* %msg, i32 12, i8 42)
Yet my code (which I got from an answer here in SO by the way) does not generate, but then I guess the IRBuilder class should know best what code to generate, so I don't know if this is a problem or not.
Well, in my case the answer was that I used a wrong target triple in my LLVM module. There is no list of valid triples and I obtained mine in some obscure way by compiling and dissembling a C program but it turned out it was wrong.
In case anybody works with cygwin, the correct target triple is:
x86_64-unknown-windows-cygnus
it appears that with the wrong target triple, the call to the external function uses the wrong type of parameter passing and the result was that no parameter arrived in it.

Clang - Compiling a C header to LLVM IR/bitcode

Say I have the following trivial C header file:
// foo1.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
My goal is to take this file, and produce an LLVM module that looks something like this:
%struct.bar = type { i32, i8* }
declare { i32, i8* } #baz(i32*, %struct.bar*, ...)
In other words, convert a C .h file with declarations into the equivalent LLVM IR, including type resolution, macro expansion, and so on.
Passing this through Clang to generate LLVM IR produces an empty module (as none of the definitions are actually used):
$ clang -cc1 -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
My first instinct was to turn to Google, and I came across two related questions: one from a mailing list, and one from StackOverflow. Both suggested using the -femit-all-decls flag, so I tried that:
$ clang -cc1 -femit-all-decls -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Same result.
I've also tried disabling optimizations (both with -O0 and -disable-llvm-optzns), but that made no difference for the output. Using the following variation did produce the desired IR:
// foo2.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
void doThings() {
foo a = 0;
bar myBar;
baz(&a, &myBar);
}
Then running:
$ clang -cc1 -S -emit-llvm foo2.h -o -
; ModuleID = 'foo2.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
%struct.bar = type { i32, i8* }
; Function Attrs: nounwind
define void #doThings() #0 {
entry:
%a = alloca i32, align 4
%myBar = alloca %struct.bar, align 8
%coerce = alloca %struct.bar, align 8
store i32 0, i32* %a, align 4
%call = call { i32, i8* } (i32*, %struct.bar*, ...)* #baz(i32* %a, %struct.bar* %myBar)
%0 = bitcast %struct.bar* %coerce to { i32, i8* }*
%1 = getelementptr { i32, i8* }* %0, i32 0, i32 0
%2 = extractvalue { i32, i8* } %call, 0
store i32 %2, i32* %1, align 1
%3 = getelementptr { i32, i8* }* %0, i32 0, i32 1
%4 = extractvalue { i32, i8* } %call, 1
store i8* %4, i8** %3, align 1
ret void
}
declare { i32, i8* } #baz(i32*, %struct.bar*, ...) #1
attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Besides the placeholder doThings, this is exactly what I want the output to look like! The problem is that this requires 1.) using a modified version of the header, and 2.) knowing the types of things in advance. Which leads me to...
Why?
Basically, I'm building an implementation for a language using LLVM to generate code. The implementation should support C interop by specifying C header files and associated libs only (no manual declarations), which will then be used by the compiler before link-time to ensure that function invocations match their signatures. Hence, I've narrowed the problem down to 2 possible solutions:
Turn the header files into LLVM IR/bitcode, which can then get the type signature of each function
Use libclang to parse the headers, then query the types from the resulting AST (my 'last resort' in case there is no sufficient answer for this question)
TL;DR
I need to take a C header file (such as the above foo1.h) and, without changing it, generate the aforementioned expected LLVM IR using Clang, OR, find another way to get function signatures from C header files (preferrably using libclang or building a C parser)
Perhaps the less elegant solution, but staying with the idea of a doThings function that forces the compiler to emit IR because the definitions are used:
The two problems you identify with this approach are that it requires modifying the header, and that it requires a deeper understanding of the types involved in order to generate "uses" to put in the function. Both of these can be overcome relatively simply:
Instead of compiling the header directly, #include it (or more likely, a preprocessed version of it, or multiple headers) from a .c file that contains all the "uses" code. Straightforward enough:
// foo.c
#include "foo.h"
void doThings(void) {
...
}
You don't need detailed type information to generate specific usages of the names, matching up struct instantiations to parameters and all that complexity as you have in the "uses" code above. You don't actually need to gather the function signatures yourself.
All you need is the list of the names themselves and to keep track of whether they're for a function or for an object type. You can then redefine your "uses" function to look like this:
void * doThings(void) {
typedef void * (*vfun)(void);
typedef union v { void * o; vfun f; } v;
return (v[]) {
(v){ .o = &(bar){0} },
(v){ .f = (vfun)baz },
};
}
This greatly simplifies the necessary "uses" of a name to either casting it to a uniform function type (and taking its pointer rather than calling it), or wrapping it in &( and ){0} (instantiating it regardless of what it is). This means you don't need to store actual type information at all, only the kind of context from which you extracted the name in the header.
(obviously give the dummy function and the placeholder types extended unique names so they don't clash with the code you actually want to keep)
This simplifies the parsing step tremendously since you only have to recognise the context of a struct/union or function declaration, without actually needing to do very much with the surrounding information.
A simple but hackish starting point (which I would probably use because I have low standards :D ) might be:
grep through the headers for #include directives that take an angle-bracketed argument (i.e. an installed header you don't want to also generate declarations for).
use this list to create a dummy include folder with all of the necessary include files present but empty
preprocess it in the hope that'll simplify the syntax (clang -E -I local-dummy-includes/ -D"__attribute__(...)=" foo.h > temp/foo_pp.h or something similar)
grep through for struct or union followed by a name, } followed by a name, or name (, and use this ridiculously simplified non-parse to build the list of uses in the dummy function, and emit the code for the .c file.
It won't catch every possibility; but with a bit of tweaking and extension, it probably will actually deal with a large subset of realistic header code. You could replace this with a dedicated simplified parser (one built to only look at the patterns of the contexts you need) at a later stage.

LLVM stdin/stdout/stderr

How does one declare stdin, stout, and stderr (preferably the C versions) in LLVM? I am trying to use some stdio functions in a toy language I am creating. One such function was fgets:
char * fgets ( char * str, int num, FILE * stream );
In order to use that I needed stdin. So I wrote some LLVM API code to generate the definition of FILE that I found, and declared stdin a external global. The code generated this:
%file = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %marker*, %file*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] }
%marker = type { %marker*, %file*, i32 }
#stdin = external global %file*
However, when I ran the resulting module, it gave me this error:
Undefined symbols for architecture x86_64:
"_stdin", referenced from:
_main in cc9A5m3z.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
Apparently, what I wrote didn't work. So my question is what do I have to write in the LLVM API to declare stdin, stout, and stderr for functions like fgets in something like a toy language compiler?
If anyone is interested, I found an answer to my question. After some intense searching I found a way to get the stdin stream without having to make a C extension: fdopen and making FILE an opaque struct.
FILE* fdopen (int fildes, const char *mode)
When fdopen is passed 0 for a file descriptor (fildes) It returns the stdin stream. Using the LLVM API, I generated the following LLVM assembly:
%FILE = type opaque
declare %FILE* #fdopen(i32, i8*)
#r = constant [2 x i8] c"r\00"
Then I was able to retrieve stdin with this call statement:
%stdin = call %FILE* #fdopen(i32 0, i8* getelementptr inbounds ([2 x i8]* #r, i32 0, i32 0))
If you use functions like putchar, printf, gets, strtol, puts, fflush you won't need stdin and stdout. I wrote a toy compiler and those were enough for I/O with strings and integers. fflush is called with null and stdout gets flushed.
%struct._IO_FILE = type opaque
declare i32 #fflush(%struct._IO_FILE*)
...
call i32 #fflush(%struct._IO_FILE* null)
...
It is platform specific. Sometimes stdin is macro to different symbol name.
On Android, for example, stdin is #define stdin (&__sF[0]).
For Microsoft Visual C++, stdin is #define stdin (&__iob_func()[0])
So you really need to look into your platform stdio.h header to figure that out.