c++ - llvm and runtime jit - c++

Context
Linux 64 bits / osx 64 bits. C++ (gcc 5.1, llvm 3.6.1)
Up to now, I always used gcc for my projects.
The problem for the next thing I am creating is the licence. Hence, I decided to give clang/llvm a go.
My needs : runtime self modifying code (and a very relaxed licence for compiler plugins for static analysis and other things.).
I played a lot with libgccjit and it works fine.
As for llvm, I read the Kaleidoscope project and some doc but it is unclear.
Question
I saw that llvm has some jit possibilities but I am not sure if it enables to self modify the code (more precisely, extend the code) at runtime as libgccjit does for c++ language.
I just need a starter here, llvm is huge and new to me, so anyone expert enough is very welcome to guide me a bit.

Related

"Include What you use"

I read about tool called "include-what-you-use" which can help clean superfluous includes
from source code. I understood that there is a version for compiler LLVM(clang) and version for GCC.
My questions are:
Why this tool is compiler-dependent and not "cross-platform" for compilers. Why from the beginning the creators of the tool didn't make it compiler-independent? Is it related to the special implementation it has or something like that?
If I want to take a version of the tool compatible for LLVM and I want to make it compatible with GCC (since I'm working with GCC), what do I have to do for that?
For the most part, Include-What-You-Use should be able to handle any valid C++ codebase, regardless of whether that codebase was written with gcc or clang in mind. I recently had the occasion to run Include-What-You-Use on a very large codebase that was usually compiled with gcc and it worked fine. So in that sense it is already compatible.
That said, it might not work perfectly. It's likely that some of the information it provides will be wrong, even if it's a clang codebase. So always verify the output manually.
why this tool is compiler-dependent and not "cross-platform" for compilers. Why from the beginning the creaters of the tool didn't make
it compiler-independent ? is it related to the special implementation
it has or something like that ?
Reason is simple, clang has is more modern fresh and has better modular architecture. As a result is is much easier to create tools using clang modules.
That is why clang was first which had address sanitizer and have more cool sanitizers. And this is why when someone creates new tool for C++ stars from clang.
If i want to take a version of the tool compatible for llvm and i want to make it compatible with gcc(since i'm working with gcc). What
i have to do for that ?
clang was created since Apple was not happy with gcc. So when it was written it supposed to be as much compatible with gcc as possible, since there was lots of code which was verified with gcc.
Since clang is mature now and provides new own features, there might be small differences with gcc (for example different bugs in implementations of new C++ standards), but in most common code there should be no problem.
IWYU should work without problems on both compilers. My coworker used it on large project build with 3 compilers (clang, gcc and VS) and it worked like a charm.
The tool itself needs parts of the compiler! It is sitting somewhere between reading the source and parsing it. LLVM provides an API which is used for the tool. The tool itself is not standalone but a plugin to the clang/llvm. As this, it needs the clang/llvm.
The modifications which will be done by the tool are fully compatible to every c++ compiler. And also the plugin in combination with clang/llvm should be able to parse more or less every code base independent of used other compilers. There might be some strange macros which are supported by other tool chains, which llvm might be struggle with. But that should be a rare case at all.

Using LLVM as virtual machine - multiplatform and multiarchitecture coding

I'm currently working in a pet programming language (for learning purposes), and have gone through a lot of research over the past year, and I think its time to finally start modelling the concepts of such a languague. First of all I want it to compile to some intermediate form, such as JVM or .NET bytecode, the goal being multiplatform/architecture compatibily. Second, I want it to be fast (I also have many other things in mind, but its not the purpose of this topic to discuss those).
The best options that came to my mind were:
Compile to JVM bytecode and use OpenJDK as runtime environment,
Compile to .NET bytecode and use Mono as runtime environment,
Compile to LLVM IR and use LLVM as runtime environment.
As you may have imagined, I've chosen LLVM. Why? because its blazing fast. I did a little benchmark using the C++ N-Body code, and achieved 7s in my machine with lli jitted IR, in contrast with 27s with clang native compiled code (I know clang first make IR then machine code).
So, here is my question: Is there any redistributable version of the LLVM basic toolset (I just need lli) that I can use? Or I must compile my own? If the latter, can you provide me with any hints on how to do it? If I really must do it, I'm thinking is cross-compiling them from my machine (Intel Mac), and generating some installers (say, an .msi for windows, .rpm and .deb for popular linux distros and .pkg for Macs). Remember, I only need a minimal subset of LLVM, such that this subset is capable of acting like a VM, by using "lli ". The real question here is how to use LLVM as a typical virtual machine.
First, I think all 3 options - LLVM IR + LLVM, Java Bytecode + OpenJDK, and .NET CIL + Mono - are excellent options, and I agree deciding between them is not easy.
If you go for LLVM and you just want to use lli, you can compile LLVM to your target platform and pack the resulting lli executable with your distribution, it should work.
Another way to write a JIT compiler via LLVM is to use an execution engine - see the handy examples in the Kaleidoscope tutorial. That means that you write your own program which will JIT-compile your own language, compile it to whatever platform you want while statically linking it with LLVM, and then distribute it.
In any case, since a JIT compiler requires copying an LLVM binary to the client side, make sure to attach a copyright notice with your distribution (you don't have to open-source your distribution, though).

LLVM vs clang on OS X

I have a question concerning llvm, clang, and gcc on OS X.
What is the difference between the llvm-gcc 4.2, llvm 2.0 and clang? I know that they all build on llvm but how are they different?
Besides faster compiling, what is the advantage of llvm over gcc?
LLVM originally stood for "low-level virtual machine", though it now just stands for itself as it has grown to be something other than a traditional virtual machine. It is a set of libraries and tools, as well as a standardized intermediate representation, that can be used to help build compilers and just-in-time compilers. It cannot compile anything other than its own intermediate representation on its own; it needs a language-specific frontend in order to do so. If people just refer to LLVM, they probably mean just the low-level library and tools. Some people might refer to Clang or llvm-gcc incorrectly as "LLVM", which may cause some confusion.
llvm-gcc is a modified version of GCC, which uses LLVM as its backend instead of GCC's own. It is now deprecated, in favor of DragonEgg, which uses GCC's new plugin system to do the same thing without forking GCC.
Clang is a whole new C/C++/Objective-C compiler, which uses its own frontend, and LLVM as the backend. The advantages it provides are better error messages, faster compile time, and an easier way for other tools to hook into the compilation process (like the LLDB debugger and Clang static analyzer). It's also reasonably modular, and so can be used as a library for other software that needs to analyze C, C++, or Objective-C code.
Each of these approaches (plain GCC, GCC + LLVM, and Clang) have their advantages and disadvantages. The last few sets of benchmarks I've seen showed GCC to produce slightly faster code in most test cases (though LLVM had a slight edge in a few), while LLVM and Clang gave significantly better compile times. GCC and the GCC/LLVM combos have the advantage that a lot more code has been tested and works on the GCC flavor of C; there are some compiler specific extensions that only GCC has, and some places where the standard allows the implementation to vary but code depends on one particular implementation. It is a lot more likely if you get a large amount of legacy C code that it will work in GCC than that it will work in Clang, though this is improving over time.
There are 2 different things here.
LLVM is a backend compiler meant to build compilers on top of it. It deals with optimizations and production of code adapted to the target architecture.
CLang is a front end which parses C, C++ and Objective C code and translates it into a representation suitable for LLVM.
llvm gcc was an initial version of a llvm based C++ compiler based on gcc 4.2, which is now deprecated since CLang can parse everything it could parse, and more.
Finally, the main difference between CLang and gcc does not lie in the produced code but in the approach. While gcc is monolithic, CLang has been built as a suite of libraries. This modular design allow great reuse opportunities for IDE or completion tools for example.
At the moment, the code produced by gcc 4.6 is generally a bit faster, but CLang is closing the gap.
llvm-gcc-4.2 uses the GCC front-end to parse your code, then generates the compiled output using LLVM.
The "llvm compiler 2.0" uses the clang front-end to parse your code, and generates the compiled output using LLVM. "clang" is actually just the name for this front-end, but it is often used casually as a name for the compiler as a whole.

LLVM compiler for production code?

I have a question concerning the LLVM compiler:
I would like to use it to compile my Objective-C source code for Mac and iOS and from their release notes it seems that LLVM is stable enough for using this.
Having made good experiences with the LLVM I would also like to use it to compile C++ or Objective-C++. However it is not clear to me if I should still use the hybrid LLVM-GCC compiler (the GCC parser and the LLVM code generator) or the pure LLVM compiler.
I am also unsure about the new C++ standard library and if I should use it and how I would make the transition from GNU's libstdc++.
The Questions
Which compiler would one use today to generate fast production quality code from C++: the LLVM-GCC hybrid compiler or pure LLVM?
Should one migrate the C++ standard library from GNU's libstdc++ to the new libc++ library created by the LLVM project?
Any comments and hints are appreciated.
Several questions asked here, I'll try to answer all of them.
There is no "pure LLVM compiler". LLVM is a set of libraries which do code optimization and code generation . There are several C/C++ frontends which can be hooked to LLVM. Among them are clang and llvm-gcc. See http://llvm.org/ for more information about various components of LLVM Compiler Infrastructure. As written at http://llvm.org/docs/ReleaseNotes.html, llvm-gcc is EOL since LLVM 2.9 release, so you'd better use clang, because it will certainly be developed and maintained in the future.
libc++ is still in development, so for production you should use vendor-provided C++ (libstdc++ in your case).
Remember, all this stuff is changing, so benchmarks gets easily outdated.
I've found following report interesting, not only as a kind of benchmark, but it seems showing some LLVM vs GCC compiler differences :
Clang/LLVM Maturity Evaluation Report by Dominic Fandrey

C++ compilers and back/front ends

For my own education I am curious what compilers use which C++ front-end and back-end.
Can you enlighten me where the following technologies are used and what hallmarks/advantages they have if any?
Open64 - is it back-end, front-end, or both? Which compilers use it? I encounter it in CUDA compiler.
EDG - as far as I can tell this is a front-end use by Intel compilers and Comeau. do other compilers use it? I found quite a few references to it in boost source code.
ANTLR - this is general parser. Do any common compilers use it?
Regarding compilers:
with front-end/back-end does gcc compiler suite uses? does it have common heritage with any other compiler?
what front-end/back-end PGI and PathScale compilers use?
what front-end/back-end XL compiler uses (IBM offering).
in-depth links on the Internet or your personal know-how would be great.
I did some Google searching, but information I generally encountered was rather superficial.
Thanks.
EDG is a front-end used by Intel and Comeau. See EDG's list of customers for other users.
ANTLR is a parser generator. I'm not aware of any C++ compiler built around a parser that was built with ANTLR (that doesn't mean it couldn't exist though).
GCC is a suite of compilers, with front ends for C, C++, Fortran, Ada, Java, etc., and back-ends for more processors than I'd care to think about.
Open64 is also a suite of compilers including several front-ends (for C, C++, Fortran, and possibly others I don't remember at the moment) and back-ends (targeting X64, Itanium, ARM, and, again, probably others I don't remember and/or don't know about). I believe its origin (pun noted by not intended) is SGI's compiler(s). I seem to remember reading something hinting that Open64 was derived from some version of the GCC front end(s), but offhand I don't know 1) how similar it remains to GCC internally, or 2) the version of GCC from which it derived -- but it's been around long enough that I'd guess it was GCC 3.x at the most recent, and quite possibly GCC 2.x.
I believe PathScale has created at least one compiler derived from Open64, but they may have others as well.
As far as I know, IBM's compiler is entirely their own creation. I'd guess IBM's (now discontinued) VisualAge for C++ shared some heritage/development/code with XL C++, but don't know that for sure, and can't even begin to guess at the extent of it, even assuming it's true.
The Clang project provides new front-ends for C/C++/Objective C on top of the LLVM backend. The LLVM project also provide a LLVM-gcc, using the GCC front end and the LLVM backend. The DragonEgg project seeks to replace the GCC backend with LLVM.
The Codeplay VectorC, Sieve and Offload compilers use a custom front-end and back-end
with front-end/backend does gcc compiler suite uses? does it have common heritage with any other compiler?
The acronym “GCC” stands for “GNU compiler collection” (originally “GNU C compiler”) and this already gives a hint: GNU compilers are a collection of compilers, most notably for C and C++ but also for Fortran, Objective-C and others. They share a common back-end and intermediate representation that was developed for GCC specifically.
The front-ends are all custom-written for the GCC. Some were contributed by third parties, most notably the Objective-C front-end, which was contributed by Apple.
Visual studio uses EDG for its intellisense engine.