Why is llvm considered unsuitable for implementing a JIT? - llvm

Many dynamic languages implement (or want to implement) a JIT Compiler in order to speed up their execution times. Inevitably, someone from the peanut gallery asks why they don't use LLVM. The answer is often, "LLVM is unsuitable for building a JIT." (For Example, Armin Rigo's comment here.)
Why is LLVM Unsuitable for building a JIT?
Note: I know LLVM has its own JIT. If LLVM used to be unsuitable, but now is suitable, please say what changed. I'm not talking about running LLVM Bytecode on the LLVM JIT, I'm talking about using the LLVM libraries to implement a JIT for a dynamic language.

Why is LLVM Unsuitable for building a JIT?
I wrote HLVM, a high-level virtual machine with a rich static type system including value types, tail call elimination, generic printing, C FFI and POSIX threads with support for both static and JIT compilation. In particular, HLVM offers incredible performance for a high-level VM. I even implemented an ML-like interactive front-end with variant types and pattern matching using the JIT compiler, as seen in this computer algebra demonstration. All of my HLVM-related work combined totals just a few weeks work (and I am not a computer scientist, just a dabbler).
I think the results speak for themselves and demonstrate unequivocally that LLVM is perfectly suitable for JIT compilation.

There are some notes about LLVM in the Unladen Swallow post-mortem blog post:
http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html .
Unfortunately, LLVM in its current state is really designed as a static compiler optimizer and back end. LLVM code generation and optimization is good but expensive. The optimizations are all designed to work on IR generated by static C-like languages. Most of the important optimizations for optimizing Python require high-level knowledge of how the program executed on previous iterations, and LLVM didn't help us do that.

There is a presentation on using LLVM as a JIT backened where the address many of the concerns raised as to why its bad, most of its seems to boil down to people building a static compiler as a JIT instead of building an actual JIT.

It takes a long time to start up is the biggest complaint - however, this is not so much of an issue if you did what Java does and start up in interpreter mode, and use LLVM to compile the most used parts of the program.
Also while there are arguments like this scattered all over the internet, Mono has been using LLVM as a JIT compiler successfully for a while now (though it's worth noting that it defaults to their own faster but less efficient backend, and they also modified parts of LLVM).
For dynamic languages, LLVM might not be the right tool, just because it was designed for optimizing system programming languages like C and C++ which are strongly/statically typed and support very low level features. In general the optimizations performed on C don't really make dynamic languages fast, because you're just creating an efficient way of running a slow system. Modern dynamic language JITs do things like inlining functions that are only known at runtime, or optimizing based on what type a variable has most of the time, which LLVM is not designed for.

Update: as of 7/2014, LLVM has added a feature called "Patch Points", which are used to support Polymorphic Inline Caches in Safari's FTL JavaScript JIT. This covers exactly the use case complained about int Armin Rigo's comment in the original question.

For a more detailed rant about the LLVM IR see here: LLVM IR is a compiler IR.

Related

Transpiling to C vs C++ : range of CPU instructions

I am considering the question of transpiling a language (home-grown DSL) to C vs to C++.
I haven't done any 'native' programming for over 15 years, so I want to check my assumptions.
Am I right into assuming that transpiling to the newest C++ version (17) would enable the native compiler to use a much wider range of 'modern' Intel/AMD CPU instructions, resulting in a more efficient executable (beyond the multi-threading / memory-model part of C++, which already by itself seems a good enough reason to go for C++)?
Put another way, isn't a large part of 'more recent' CPU instructions never generated by a C compiler, simply because it has too little information about the programmer intent, due to the simpler syntax of C? I know I could access all CPU instructions with assembler, but that is precisely what I don't want to do. Ideally, I would want the generated code to still be as platform-independent as possible.
All of your assumptions about the relationship between programming language and "modern CPU instructions" are incorrect.
Let's consider the GNU Compiler Collection.
The choice of language here doesn't much matter, as the language front-ends all end up generating the same intermediate form called GIMPLE. The optimizing passes then work on that.
The range of CPU instructions which can be emitted is controlled by the -mtune option. For x86, GCC is capable of emitting modern AVX 512 instructions when optimizing some very plain-looking C code. Automatic loop vectorisation is a powerful thing. Try it out: implement memcpy and look at the generated assembly.
My advice: generate clean, un-clever C code, and crank up the optimization level. Just like you would do if writing code by hand.
You might also consider implementing your language directly as a front-end to GCC or LLVM, without transpiling to C or C++. LLVM was designed for this purpose, intended to make implementing new languages easy, and still taking advantage of modern optimization approaches.

The perks and the pipeline for an LLVM based compilation

I see that more and more people are switching to LLVM, especially people with a background in C or C++, so there is a pattern in which kind of people are approaching this compiler, what surprises me is the highly heterogeneous set of technologies that LLVM can manage, and I don't get what is pipeline that this virtual machine follows and what are the resulting benefits.
I would like to stress the fact that I'm focusing on LLVM, not really on clang.
A 1 in a million example is this one ( Youtube Video ), where the pipeline is not really obvious for me, or this other one, but apparently there a lot of totally different solutions where, for example, LLVM is used in conjunction with a JIT solution.
In short I see different syntax and semantics, people using LLVM to produce GPU shaders or binary objects, but I can't see the common denominator.
What is the meaning of "LLVM based compilation", Considering LLVM as a black box, what is the kind of input, output and the business logic in the middle ?
I can't see the common denominator.
The common denominator is converting code in one language to code in another language. And that's exactly what compilers do. So if you want to convert a piece of code in a "source language" to one in a "target language", what you need to do is:
Write a "front-end" - a component that converts from your source language to what LLVM expects as input. That language is an LLVM-specific language called "LLVM Bitcode" or "LLVM IR".
Alternatively, reuse an existing front-end - for example Clang.
Write a "back-end" - a component that converts from what LLVM emits to your target language.
Or use an existing back-end, for example LLVM's x86 back-end.
That's it. Now you get to enjoy things like the optimizations LLVM performs on the code between its input and output, its common framework for "lowering" the code to something closer to machine code, etc.
GCC behaves the same, by the way, it's just that LLVM is considered by many to be superior in some aspects, particularly licensing and ease of modification.
LLVM's advantage over other source-available compilers is that it is designed as a set of reusable libraries. That means to some degree you can pick-and-choose what to include in your tool. Not every language tool needs optimization and not every language tool needs code generation. LLVM is a very flexible system for langauge processing.
Generally when people say, "LLVM based compilation," they mean using one or more of the LLVM libraries to implement their tool. They can leverage all of the work put into LLVM in understanding its IR and generating code for multiple targets.
The LLVM IR is the common representation used by most of the LLVM libraries. It is the interface you need to write to. For low-level stuff like machine code you will need to deal with some of the other LLVM representations (MachineInstr, MC, etc.).
As for writing a frontend to generate that LLVM IR, the tricky part is ensuring that the translation from your source language to the LLVM IR preserves the semantics of the source language. The LLVM IR has a well-defined but low-level set of semantics for each instruction. If your source language has higher-level semantics you will have to lower them into LLVM IR instruction sequences to implement it. For example, there is no LLVM instruction that handles C-style bitfield access so C language frontends must use a sequence of LLVM instructions to implement the functionality (generally shifts and bitwise operations).
As long as you implement the semantics of your source language in the LLVM IR correctly, the LLVM libraries will have no problem performing correct code transformations. If some desired transformation requires higher-level semantics information than LLVM IR can provide, you either have to do the transformation in some stage before converting to LLVM IR (and so you will have the high-level information available) or you can pass attribute information in the LLVM IR to convey the high-level semantics and write a custom LLVM pass to implement the transformation. It is usually far cleaner to do the former than the latter.

Which is faster, Clojure or ClojureScript (and why)?

If I had to guess, I'm pretty sure the answer is Clojure, but I'm not sure why. Logically (to me) it seems like ClojureScript should be faster:
Both are "dynamic", but ClojureScript
Compiles to JavaScript, running on V8
V8 engine is arguably the fastest dynamic language engine there is
V8 is written in C
whereas Clojure:
Is also dynamic
Runs in JVM, which has no built-in dynamic support, so I'm thinking thus JVM has to do whatever V8 is doing too, to enable dynamic support
and Java is slower than C
So how could Clojure be faster than ClojureScript? Does "dynamic" mean something different when saying JavaScript is dynamic and Clojure is dynamic? What am I not seeing?
(Of course if ClojureScript is indeed faster, then is the above reasoning correct?)
I guess, what does Clojure compile to....is at least part of the question. I know the JVM part can't just be a plain interpreter (otherwise ClojureScript would be faster), but Clojure can't compile to regular bytecode, as there's no "dynamic" in the JVM. So what's the difference between how ClojureScript is compiled/executed and how Clojure is compiled/excecuted and how plain Java is compiled/executed, and the performance differences implied in each?
Actually, V8 is written in C++. However, does basically the same thing as the JVM, and JVM is written in C. V8 JITs Javascript code and executes the JIT'd code. Likewise the JVM JIT compiles (or hotspot compiles) bytecode (NOT Java) and executes that generated code.
Bytecode is not static, as Java is. In fact it can be quite dynamic. Java, on the other hand is mostly static, and it is not correct to conflate Java with bytecode. The java compiler transforms Java source code into bytecode, and the JVM executes the bytecode. For more information, I recommend you look at John Rose's blog (example). There's a lot of good information there. Also, try to look for talks by Cliff Click (like this one).
Likewise, Clojure code is directly compiled to bytecode, and the JVM then does the same process with that bytecode. Compiling Clojure is usually done at runtime, which is not the speediest process. Likewise the translation of Clojurescript into Javascript is not fast either. V8's translation of Javascript to executable form is obviously quite fast. Clojure can be ahead of time compiled to bytecode though, and that can eliminate a lot of startup overhead.
As you said, it's also not really correct to say that the JVM interprets bytecode. The 1.0 release did that more than 17 years ago!
Traditionally, there were two compilation modes. The first mode is a JIT (Just in Time) compiler. Where bytecode is translated directly to machine code. Java's JIT compiling executes fast, and it doesn't generate highly optimized code. It runs OK.
The second mode is called the hotspot compiler. The hotspot compiler is very sophisticated. It starts the program very quickly in interpreted mode, and it analyzes it as the program runs. As it detects hotspots (spots in the code that execute frequently), it will compile those. Whereas the JIT compiler has to be fast because nothing executes unless it's JIT'ed the hotspot compiler can afford to spend extra time to optimize the snot out of the code that it's compiling.
Additionally, it can go back and revisit that code later on and apply yet more optimizations to it if necessary and possible. This is the point where the hotspot compiler can start to beat compiled C/C++. Because it has runtime knowledge of the code, it can afford to apply optimizations that a static C/C++ compiler cannot do. For example, it can inline virtual functions.
Hotspot has one other feature, which to the best of my knowledge no other environment has, it can also deoptimize code if necessary. For example, if the code were continually taking a single branch, and that was optimized and the runtime conditions change forcing the code down the other (unoptimized) branch and performance suddenly becomes terrible. Hotspot can deoptimize that function and begin the analysis again to figure out how to make it run better.
A downside of hotspot is that it starts a bit slow. One change in the Java 7 JVM has been to combine the JIT compiler and the hotspot compiler. This mode is new, though, and it's not the default, but once it is initial startup should be good and then it can begin the advanced optimizations that the JVM is so good at.
Cheers!
This question is hard to answer precisely, without reference to a specific benchmark task (or even specific versions of Clojure or ClojureScript).
Having said that, in most situation I would expect Clojure to be somewhat faster. Reasons:
Clojure usually compiles down to static code, so it doesn't actually do any dynamic lookups at runtime. This is quite important: high performance code often produces bytecode that is very similar to statically typed Java. The question appears to be making the false assumption that a dynamic language has to do dynamic method lookups at runtime: this is not always the case (and usually isn't in Clojure)
The JVM JIT is very well engineered, and I believe it is currently still a bit better than the JavaScript JITs, despite how good V8 is.
If you need concurrency or need to take advantage of multiple cores then clearly there is no contest since JavaScript is single-threaded.....
The Clojure compiler is more mature than ClojureScript, and has had quite a lot of performance tuning work in recent years (including things like primitive support, protocols etc.)
Of course, it is possible to write fast or slow code in any language. This will make more of a difference than the fundamental difference between the language implementations.
And more fundamentally, your choice between Clojure and ClojureScript shouldn't be about performance in any case. Both offer compelling productivity advantages. The main deciding factor should be:
If you want to run on the web, use ClojureScript
If you want to run on the server in a JVM environnment, use Clojure
This is not so much an answer as an historical comment: Both the HotSpot VM and the V8 js engine can have their origins traced to the Self project at Sun Microsystems, which I think prototyped a lot of the technology that allows them to run as fast as they do. Something to consider when comparing them both. I would've posted this as a comment but the reputation system prevented me.

Writing a new jit

I'm interested in starting my own JIT project in C++. I'm not that unfamiliar with assembly, or compiler design etc etc. But, I am very unfamiliar with the resulting machine code format - like, what does a mov instruction actually look like when all is said and done and it's time to call that function pointer. So, what are the best resources for creating such a thing?
Edit: Right now, I'm only interested in x86 on Windows, stretching a tiny bit to 64bit Windows in the future.
You want to have a look at the processor manuals for the architecture you are interested in. Those manuals describe the opcode encoding. For x86 processors, the manuals can be downloaded from this page.
Starting your project on top of LLVM might shield you from the platform details.
http://llvm.org/
LLVM is used by several dynamic language JIT compilers.
GNU lightning is a multi-architecture (x86, SPARC, PPC) library for generating code within another program. You'll need to understand general assembly language concepts, but not at a very deep level. You won't have to write anything architecture-specific at all. The down side to lightning (at least last time I used it) is that the interface presented is the intersection of the features available on the supported targets: The small register set of x86, a RISC instruction set like SPARC, and so on. The single-pass code generation is easy to use but has its own quirks, like you can't relocate your output buffer (because of address references) so if you run out of space you generally have to start over. The good thing is that you will probably get a working example going very quickly.
Older versions of NASM come with a fairly concise opcode reference that has x86 instruction encodings. (Looks like there's no 64-bit info, though.) I found this one using google:
http://alien.dowling.edu/~rohit/nasmdocb.html
The official manuals say basically the same thing (and a lot more besides), but not quite so conveniently.

Creating a VHDL backend for LLVM?

LLVM is very modular and allows you to fairly easily define new backends. However most of the documentation/tutorials on creating an LLVM backend focus on adding a new processor instruction set and registers. I'm wondering what it would take to create a VHDL backend for LLVM? Are there examples of using LLVM to go from one higher level language to another?
Just to clarify: are there examples of translating LLVM IR to a higher level language instead of to an assembly language? For example: you could read in C with Clang, use LLVM to do some optimization and then write out code in another language like Java or maybe Fortran.
Yes !
There are many LLVM back-end targeting VHDL/Verilog around :
(open source) Legup paper
(commercial) Xilinx HLS
(online) C-to-verilog
And I know there are many others...
The interesting thing about such low-level representations as LLVM or GIMPLE (also called RTL by the the way) is that they expose static-single assignments (SSA) forms : this can be translated to hardware quite directly, as SSA can be seen as a tree of multiplexers...
There's nothing really special about the LLVM IR. It's a standard DAG with variable arity. Decompiling LLVM IR is a lot like decompiling machine language.
You might be able to leverage some frontend optimizations such as constant folding, but that sounds pretty minor compared to the whole task.
My only experience with LLVM was writing a binary translator for a class project, from a toy CISC to a custom RISC.
I'd say, since it's the closest thing to a standard IR (well, GCC GIMPLE is a close second), see if it fits with your algorithms and style and evaluate it as one alternative.
Note that GCC also started out prioritizing portability above all, and has also accomplished a lot.
I'm not sure I follow how parts of your question relate one to another.
To target LLVM into a high-level language like C is very possible and you seem to have found one reference point.
VHDL is a whole other business however. Do you consider VHDL a high-level language? It may be, but but describing hardware/logic. Sure VHDL has some constructs that you can employ to actually program in it, but it's hardly a fruitful endeavor. VHDL describes hardware and thus makes translating LLVM IR into it a very hard problem, unless of course you design a CPU with a custom instruction set in VHDL and translate LLVM IR into your instructions.
This thread was one of the first things I found while looking for the same thing.
I found a project that's rather far along that cleanly builds under/with llvm 3.5. It's pretty darn cool. It spits out HDL and does various other cool FPGA related things. While it's designed to work with TTAs and generate images for FPGA (or simulate them), it can probably also be made to do some trivial HDL generation from c functions.
It was perfect for my purposes because I wanted to upload to an Altera FPGA, and the fpga_stdout example even spits out Quartus build scripts and project files.
TTA-Based Co-design Environment
I also tried the things listed in the accepted answer and a couple others and found that they weren't going to work for me or weren't very high quality (usually both). TCE is professional feeling, but purely academic I believe. Very nice all the way around.
It seems the question was partially answered, so Iā€™d like to give it a shot:
What it would take to create a VHDL backend for LLVM?
What it would take to translate LLVM IR to a higher level language (presumably with the intention of converting between high-level langs)?
I will give you some background on 2. And expand at a later date on 1.
If you want to convert LLVM IR to a high-level language such as C or Java:
You would have to take the LLVM instructions, and abstract that out into its equivalent C code. Then you need to take the remaining features that LLVM does not have an equivalent for (like classes and abstractions for C++) and write a routine that would find those patterns in the LLVM (like reused blocks) and write C. For the basic stuff, its pretty straightforward. But, just follow the train of thought and you quickly find yourself realizing the true difficultly of the problem, after all not everyone writes simple C. To compound the difficulty further, you may not get the same LLVM IR when compiling the generated C! (Consider the resulting feedback loop)
As for Java, you are in for an even harder battle going direct from LLVM IR, and in either case still have the problem you likely won't get the same code compiling to LLVM IR, if one even can do that. Rather, you would translate LLVM IR to JVM Bytecode. Then you could use a reverse compiler to get your Java.
A group of Chinese students was apparently able to do this, but they wondered why such little interest in their research. I would say its bc they don't fully understand just what the LLVM guys have done, and how it is better than the JVM. (In fact, LLVM arguably makes the JVM obsolete ;)
Even though this seems useful in that one can use LLVM as an intermediary between C and Java to convert bidirectionally, this solution is actually of little use because we are asking the wrong question. See, the entire reason you would want that for practical purposes is to have a common code base and increase performance.
But the real problem is that we need a language that has abstracted the common features of modern languages, and that gives you a central language that you can build from. http://julialang.org/ has answered the question šŸ˜‰
Looks like the best place to start is with the CBackend in the LLVM source:
llvm/lib/Target/CBackend/CBackend.cpp
tl,dr: I don't think LLVM is the right tool
What your are looking for is way to translate LLVM code to a higher language that's what emscripten do for Javascript.
But it looks like you miss a bit the point of LLVM as it's meant to generate static code in order to achieve that they use a specific intermediate language build for that purpose.
As you can see the way emscripten works is by implementing a stack, but without using javascript as a human would have done it.
They are several project that try to achieve what you original question was, like MyHDL that turns python to VHDL or Verilog.