Is it possible to compile LLIR to binary without clang? - llvm

I'm writing a compiler that embeds the LLVM API. By copying code from the llc tool, I can output assembly language or object files that I can turn into binaries using clang or an assembler.
But I want my compiler to be self contained. Is it possible to turn LLIR into binaries using LLVM? This seems like the sort of thing that should be in the LLVM toolkit.

Yes, it is possible and this is also done by llc with -filetype=obj argument.
You can consult the compileModule function to learn how to use the programmatic API.
Note that this will only generate an object file for a given translation unit. You will also need a linker to convert it into a proper executable or library. The LLVM linker, lld, can also be embedded into client applications as a library, so in the end you will be able to create a self-hosting compiler.

Related

Load and link obj files at Runtime in C/C++ (JIT)

I'm searching for a C or C++ library which can load and link obj files (doesn't matter if ELF or obj) dynamicly at runtime. I spend some time searching for such library, but my results weren't successful.
What I tried:
LLVM:
Currently my best solution! I used Clang to generate .obj files in the bytecode format of LLVM and used its JIT functions to dynamic load and execute the function. But, the LLVM is huge and my PC at home hasn't the power to compile the complete LLVM just for the JIT. Also I encountered some problems with relocation overflows or not implemented relocation types.
libjit:
I read, that it can load .elf files and link them too. But sadly, I couldn't compile it for windows, so I couldn't try.
Nanojit and NativeJit:
It seems like they don't support JITting an object file.
So... What can I do? Do I have to stick around with the LLVM? Are there any alternatives?
I suppose that an analogy that can be taken as a 1st approach is that the .bc is similar to an .o (or .obj) file in that it is just the translation of C++ code to an intermediate language, and tht it can contain references to functions not defined in it, to be searched in libraries.
And that the JIT-ted code is similar to a DLL, in the sense that it will be linked dynamically to the executable where it will run in.
You need not to compile LLVM -- you can download the binaries for LLVM and assorted utilities (like clang) from LLVM Download Page

How to use llvm libraries

I am working in a project that consist of some C++ teams. Each team delivers libraries and our team is integrating these libraries into a front end application.
The application is cross platform, so it means that other the teams have to provide the same (static) libraries compiled for different platforms/CPU architecture/configuration. Eg. we have Visual Studio 2015/2013, 32bit/64bit, linux, Debug/Release etc.
It would be nice to reduce the number of these static library "manifests", so I was looking into the Clang/LLVM. The idea would be compile the static libraries into LLVM bitcode and use the llvm-ar tool to create an llvm static library. When we have to make the binaries for a specific platform we would use the llc (LLVM platform compiler) to create the native code static library and do the linking with the platform linker.
Questions:
is there a better way to do what I want to achieve?
the llc does not seem to support the compiling of a static library, only individual translation units (.bc -> .o). Of course I can extract each individual bitcode file, assemble it to native object file and use the platform librarian tool (lib/ar) to make the static library, but I wonder if there is a more streamlined solution.
the gold linker seems to make something I need, but seems to be restricted to ELF format. I have to support Windows/Linux and maybe IOS
LLVM IR generated from target-specific and platform-specific language (C/C++) won't be target neutral. Think about type sizes, alignments, ABI requirements, etc. Not the mention pure source code features like preprocessor. So, no, the approach you thought about won't work at all.
See LLVM bitcode cross-platform for some more information.

Just-in-time compilation using libclang and LLVM C

I have a software that is able to generate C code that I would like to use in a just-in-time compilation context. From what I understand, LLVM/Clang is the way to go and, for maintainability of the project, I'd like to use the C API of llvm and Clang (libclang).
I started out creating a libclang context using clang_createIndex and a translation unit using createTranslationUnitFromSourceFile (would have been nice to be able to avoid going via the file system and pass the source code as a string instead). But I pretty much get stuck there. How can I go from the libclang translation unit to an LLVM "execution engine" which is what appears to be needed for JIT? Or is this not even possible using the C API?
The best method to learn how to use a body of code is to investigate the examples you're given.
There exist tutorials on how to leverage the clang/llvm tools to compile C++ code and emit LLVM-IR, to compile LLVM-IR to LLVM-Bitcode, and to execute that LLVM-bitcode. All that is necessary to learn to incorporate this functionality in our application is to investigate the execution path of these tools, to find the sequence of methods that accomplish what we want.
Here is an example using the example tools of compiling a cpp file to llvm-bitcode, and executing it.
clang++ -c -O3 -emit-llvm main.cpp -o main.bc
lli main.bc
This is a great start, we can just look at the source behind the tools, and investigate the execution path outlined by the arguments. Since these tools are merely interfaces exposing the underlying functionality available in the llvm/clang libraries that we can add to our project, following the execution path shallowly will give us a sequence of library available methods that we can call within our application to accomplish the same results.
Once the sequence of library methods is trivially established, you can delve into breaking individual library methods into their underlying functionality, and tease out the exact behavior we desire through a relatively small set of modifications here and there, rather than trying to reimplement something from the ground up.

How is clang able to steer C/C++ code optimization?

I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.
(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.
It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.

binary generation from LLVM

How does one generate executable binaries from the c++ side of LLVM?
I'm currently writing a toy compiler, and I'm not quite sure how to do the final step of creating an executable from the IR.
The only solution I currently see is to write out the bitcode and then call llc using system or the like. Is there a way to do this from the c++ interface instead?
This seems like it would be a common question, but I can't find anything on it.
LLVM does not ship the linker necessary to perform this task. It can only write out as assembler and then invoke the system linker to deal with it. You can see the source code of llvm-ld to see how it's done.