Which is faster, Clojure or ClojureScript (and why)? - clojure

If I had to guess, I'm pretty sure the answer is Clojure, but I'm not sure why. Logically (to me) it seems like ClojureScript should be faster:
Both are "dynamic", but ClojureScript
Compiles to JavaScript, running on V8
V8 engine is arguably the fastest dynamic language engine there is
V8 is written in C
whereas Clojure:
Is also dynamic
Runs in JVM, which has no built-in dynamic support, so I'm thinking thus JVM has to do whatever V8 is doing too, to enable dynamic support
and Java is slower than C
So how could Clojure be faster than ClojureScript? Does "dynamic" mean something different when saying JavaScript is dynamic and Clojure is dynamic? What am I not seeing?
(Of course if ClojureScript is indeed faster, then is the above reasoning correct?)
I guess, what does Clojure compile to....is at least part of the question. I know the JVM part can't just be a plain interpreter (otherwise ClojureScript would be faster), but Clojure can't compile to regular bytecode, as there's no "dynamic" in the JVM. So what's the difference between how ClojureScript is compiled/executed and how Clojure is compiled/excecuted and how plain Java is compiled/executed, and the performance differences implied in each?

Actually, V8 is written in C++. However, does basically the same thing as the JVM, and JVM is written in C. V8 JITs Javascript code and executes the JIT'd code. Likewise the JVM JIT compiles (or hotspot compiles) bytecode (NOT Java) and executes that generated code.
Bytecode is not static, as Java is. In fact it can be quite dynamic. Java, on the other hand is mostly static, and it is not correct to conflate Java with bytecode. The java compiler transforms Java source code into bytecode, and the JVM executes the bytecode. For more information, I recommend you look at John Rose's blog (example). There's a lot of good information there. Also, try to look for talks by Cliff Click (like this one).
Likewise, Clojure code is directly compiled to bytecode, and the JVM then does the same process with that bytecode. Compiling Clojure is usually done at runtime, which is not the speediest process. Likewise the translation of Clojurescript into Javascript is not fast either. V8's translation of Javascript to executable form is obviously quite fast. Clojure can be ahead of time compiled to bytecode though, and that can eliminate a lot of startup overhead.
As you said, it's also not really correct to say that the JVM interprets bytecode. The 1.0 release did that more than 17 years ago!
Traditionally, there were two compilation modes. The first mode is a JIT (Just in Time) compiler. Where bytecode is translated directly to machine code. Java's JIT compiling executes fast, and it doesn't generate highly optimized code. It runs OK.
The second mode is called the hotspot compiler. The hotspot compiler is very sophisticated. It starts the program very quickly in interpreted mode, and it analyzes it as the program runs. As it detects hotspots (spots in the code that execute frequently), it will compile those. Whereas the JIT compiler has to be fast because nothing executes unless it's JIT'ed the hotspot compiler can afford to spend extra time to optimize the snot out of the code that it's compiling.
Additionally, it can go back and revisit that code later on and apply yet more optimizations to it if necessary and possible. This is the point where the hotspot compiler can start to beat compiled C/C++. Because it has runtime knowledge of the code, it can afford to apply optimizations that a static C/C++ compiler cannot do. For example, it can inline virtual functions.
Hotspot has one other feature, which to the best of my knowledge no other environment has, it can also deoptimize code if necessary. For example, if the code were continually taking a single branch, and that was optimized and the runtime conditions change forcing the code down the other (unoptimized) branch and performance suddenly becomes terrible. Hotspot can deoptimize that function and begin the analysis again to figure out how to make it run better.
A downside of hotspot is that it starts a bit slow. One change in the Java 7 JVM has been to combine the JIT compiler and the hotspot compiler. This mode is new, though, and it's not the default, but once it is initial startup should be good and then it can begin the advanced optimizations that the JVM is so good at.
Cheers!

This question is hard to answer precisely, without reference to a specific benchmark task (or even specific versions of Clojure or ClojureScript).
Having said that, in most situation I would expect Clojure to be somewhat faster. Reasons:
Clojure usually compiles down to static code, so it doesn't actually do any dynamic lookups at runtime. This is quite important: high performance code often produces bytecode that is very similar to statically typed Java. The question appears to be making the false assumption that a dynamic language has to do dynamic method lookups at runtime: this is not always the case (and usually isn't in Clojure)
The JVM JIT is very well engineered, and I believe it is currently still a bit better than the JavaScript JITs, despite how good V8 is.
If you need concurrency or need to take advantage of multiple cores then clearly there is no contest since JavaScript is single-threaded.....
The Clojure compiler is more mature than ClojureScript, and has had quite a lot of performance tuning work in recent years (including things like primitive support, protocols etc.)
Of course, it is possible to write fast or slow code in any language. This will make more of a difference than the fundamental difference between the language implementations.
And more fundamentally, your choice between Clojure and ClojureScript shouldn't be about performance in any case. Both offer compelling productivity advantages. The main deciding factor should be:
If you want to run on the web, use ClojureScript
If you want to run on the server in a JVM environnment, use Clojure

This is not so much an answer as an historical comment: Both the HotSpot VM and the V8 js engine can have their origins traced to the Self project at Sun Microsystems, which I think prototyped a lot of the technology that allows them to run as fast as they do. Something to consider when comparing them both. I would've posted this as a comment but the reputation system prevented me.

Related

Is There a C++ Command Line?

So Python has a sort of command line thing, and so does Linux bash (obviously), and I'm sure other programming languages do, but does C++? If not, why do C++ scripts have to be compiled first and then run?
If not, why do C++ scripts have to be compiled first and then run?
C++ code does not need to be compiled to be run. There are interpreters.
The reason most of us prefer compiled C++ is that the resulting executable is 'faster'.
Interpreted computer languages can do extra things to achieve similar performance (i.e. just-in-time compile), but generally, 'scripts' are not in the same league of fast.
Some developers think not having to edit, compile, link is a good thing ... just type in code and see what it does.
Anyway, the answer is, there is no reason that C++ "has to" be compiled. It is just the preferred tool for most C++ developers.
Should you want to try out C++ interpreters, search the net for CINT, Ch, and others.
Indeed there are interpreters for C++ that do what you want. Check out
Cling.
To the commenters saying C++ can't have interpreters because it's a compiled language: yes, typically you use a compiler with C++. But that doesn't mean it's impossible to write an interpreter for it.
There is no command lines to run C++ instructions. It is compiled first and then target machine code generated (intermediate obj code, and linked) to run.
The reason is, It is matter of language design for various considerations like performance, error recovery etc. Compiled code generate target machine code directly and run faster than interpreted languages. Compiled code take program as whole and generate the target machine code vs interpreted code take few instruction at once. Interpreted language require intermediate programs to target the final machine code, so it may be slow.
In nutshell, it is language design evolution. When first computers appeared programming is done directly in machine language. Those programs run instruction by instruction. Later high level language appeared, where machine language is abstracted with human friendly instructions and compilers designed to generate equivalent machine code.
Later Computer program design advanced, and CPU instruction cycle speed increased, we could afford intermediate interpreters for writing safer programs.
Choice is wider now, earlier performance centric apps demanded compiled code. Now even interpreted code equally faster in common use cases.
While there are interpreters for C++-like languages, that is not really the point; C++ is a compiled language that is translated to native machine code. Conversely scripting languages are (typically) interpreted (albeit that there are also compilers for scripting languages to translate them to native code).
C++ is a systems-level capable language. You have to ask yourself - if all languages ran in a shell with a command line and were interpreted, what language is that shell or interpreter, or even the OS they are running on written in?
Ultimately you need a systems level language, and those are most often C, C++ and assembler.
Moreover because it is translated to machine level code at compilation, that code runs directly and stand-alone without the presence of any interpreter, and consequently can be simpler to deploy, and will execute faster.

Clojure performance on JVM versus CLR

Are there are performance comparisons of Clojure on JVM versus CLR? Or maybe someone who has used both with performance-sensitive code can give some anecdotal comments?
The performance of Clojure JVM is better than that of Clojure CLR. I don't have explicit benchmarks to point to, but I have a lot of experience doing compilation and running tests in both environments and the difference is obvious.
There are several factors involved in the difference. Some are being worked on. Some are related to JVM vs CLR perf differences and hence beyond the means of the ClojureCLR developers to address.
(1) Compilation of Clojure code to the platform Intermediate Language.
At the most basic level, the IL generated is almost identical. However, design choices forced by some limitations of the Dynamic Language Runtime result in each function definition creating an extra class and function invocations to have an extra method call. Version 1.4 of ClojureCLR (coming soon) eliminates the use of the DLR for most code generation. (The DLR will still be used for CLR interop and polymorphic inline caching.) At this point, generated code will be substantially the same as the JVM version. Startup time has been reduced by 10% and simple benchmarks show 4-16% improvements over version 1.3. More details here.
(2) Startup time
Clojure JVM starts significantly faster than Clojure CLR. Most of this is traceable to the JVM being able to selectively load class files (versus the CLR loading entire assemblies) and differences in when JIT compilation occurs. However, if ClojureCLR is NGEN'd, startup times are really fast. More details here.
(3) JVM versus CLR performance
Some attention has been paid to making ClojureJVM work well with HotSpot compiler optimizations. I don't have explicit proof, but I'm guessing that HotSpot just does a better job on things like inlining in compiled Clojure code versus the CLR JITter. It is fair to say that no attention has been paid to how to make ClojureCLR take better advantage of the CLR JITter.
The release of ClojureCLR 1.4 will provide a good opportunity for some benchmarking.
I've not really used the CLR version so can't fully answer your question.
However it is worth noting that most of the optimisation / development effort so far has gone into the mainline JVM version of Clojure. As a result you can expect the JVM version of Clojure to perform considerably better at present in most situations.
Clojure on the JVM is already one of the fastest dynamically typed languages around - from the benchmarks game page Common Lisp is the only dynamically typed language which is (marginally) faster.
Over time I'd expect the Clojure JVM/CLR gap to narrow as both versions tend towards the performance of their host platforms. But right now, if performance is your key concern, I'd definitely recommend the JVM version (as well as performance, the JVM version is also likely to be better for overall maturity, library availability and cross platform support).

Tokenization and AST

Have a rather abstract question for you all. I'm looking at getting involved in a static code analysis project. It uses C and C++ as the language to develop in so if any code could be in either of those languages in your response that would be great.
My question:
I need to understand some of the basic concepts/constructs used to process code for static analysis. I have heard people use things like AST and tokenization etc. I was just wondering if anything can clarify how these things are applied in creating a static analysis tool? Id like more of an explanation of tokenization as I dont really understand that so well. I understand it is a sort of way to process strings but I'm not confident in that answer. Furthermore, I know that the project I'm looking at passes the code through the preprocessor before it is analyzed. Can anyone explain this? Surely if it is static code analysis it needn't be preprocessed?
Hope someone can clear this up for me.
Cheers.
Tokenization is the act of breaking source text into language elements such as operators, variable names, numbers, etc. Parsing reads sequences of tokens and build Abstract Syntax Trees, which is a particular program representation. Tokenization and parsing are necessary for static analysis, but hardly interesting, in the same way that ante-to-the-pot is necessary to playing poker but not the interesting part of the game in any way.
If you are building a static analyzer (you imply you expect to work on one implemented in C or C++), you will need fundamental knowledge of compiling (parsing not so much unless you are building a parser for the language to be statically analyzed), but certainly about program representations (ASTs, triples, control and data flow graphs, ...), type and property inference, and limits on analysis accuracy (the cause of conservative analyses.
The program representations are fundamental because these are the data structures that most static analysers really process; its simply too hard to wring useful facts directly from program text. These concepts can be used to implement static analysis capabilities in any programming language to implement analysis type tools; there's nothing special in implementing them in C or C++.
Run, don't walk, to your nearest compiler class for the first part of this. If you don't have it, you won't be able to do anything effective in tool building. The second part you will more likely find in a graduate computer science class.
If you get past that basic knowledge issue, you will either decide to implement an analysis tool from scratch, or build on existing analysis tool infrastructure. Few people decide to build one from scratch; it takes a huge amount of work (years or decades) to build robust parsers, flow analyzers, etc. needed as foundations for any particular static analysis. Mostly people try to use some existing infrastructure.
There's a huge list of candidates at: http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
If you insist on processing C or C++ and building your own custom sophisticated analysis, you really, really need a tool that can handle real C and C++ code. There are IMHO a limited number of good candidates:
GCC (and various graft ons such as Starynkevitch's MELT, about which I know little)
Clang (pretty spectacular toolset)
DMS (and its C and C++ front ends) [my company's tool]
Open64 compiler infrastructure
Rose compiler infrastructure (based on EDG's industry-wide front end)
Each one of these are big systems and require a big investment to understand and begin to use. Don't underrate the learning curve.
There are lots of other tools out there that sort of process C and C++, but "sort of" is pretty useless for static analysis purposes.
If you intend to simply use a static analysis tool, you can avoid learning most of the parsing and program representation questions; instead you'll need to learn as much as you can about the specific analysis tool you intend to use. You'll still be a lot better off with the compiler background above because you will understand what the tool does, why it does it, and why it produces the kinds of answers that it does (usually it produces answers that are unsatisfying in a lot of ways due to the conservative limits on analysis accuracy).
Lastly, you should be clear that you understand the difference between static analysis and dynamic analysis [using data collected at runtime to determine program properties]. Most end users couldn't care less how you get information about their code, and each analysis approach has its strength and weaknesses.
If you are interested in static analysis, you should not bother about syntax analysis (i.e. lexing, parsing, building the abstract syntax tree), because that won't learn you much.
So I suggest you to use some existing parsing infrastructure. This is particularly true if you want to analyze C++ source code, because just coding a C++ parser is a difficult thing to do.
For example, you could leverage on the GCC compiler, or on the LLVM/Clang compiler. Be aware of their open source license: GCC is under GPLv3, and Clang/LLVM has a BSD-like license. GCC is able to handle not only C & C++, but also Fortran, Go, Ada, Objective-C.
Because of the GPLv3 license of GCC, your development above it should be also free software under GPLv3. However, that also means that the GCC community is quite big. If you know Ocaml and are interested of static analysis of C only, you could consider Frama-C (LGPL licensed)
Actually, I am working on GCC, by providing the MELT extension framework (MELT is also GPLv3 licensed). MELT is a high-level domain specific language to extend GCC and should be the ideal choice to work on static analysis using the powerful framework and internal representations (Gimple, Tree, ...) of GCC.
(Of course, LLVM folks would be delighted to have their work used too; and nobody knows well both LLVM and GCC, because learning such big software is already a challenge, so people have to choose on which they invest their time).
So my suggestion is to join an existing static analysis or compiler effort, don't reinvent the wheel by starting from scratch (because that means years of effort, just to have a good parser).
By so doing, you will take advantage of the existing infrastructure (so you won't bother about AST-s and parsing) and you will improve some existing project.
I would be very delighted if you are interested in MELT. FWIW, there is a MELT tutorial at Paris, on january 24th 2012, HiPEAC conference.
If you are looking into static analysis for C & C++, your first stop should be libclang, along with its associated papers.
In terms of why the preprocessor is run, static analysis is meant to catch errors and possible unintended code functionality (very basically), this it must analyse what the compiler will be compiling, not what the user sees.
I'm pretty sure #Ira Baxter will popup with a far more complete answer :)

Script programming strategies for high-load servers

I have a MMORPG server in C++, I've never done scripting before and from my point of view I think it would be degrading to the overall performance of the server if I parse scripts on the go (I haven't tested though), but I would like to have such functionality.
What good scripting techniques for multi-threaded environments that you would suggest/use? A book or an article would be nice too, preferably related to C++ but I don't mind other languages.
Thanks.
I believe the majority of commonly used scripting languages perform parsing as a separate step to execution, so that wouldn't be a significant performance cost. Usually they compile to some kind of bytecode format (Python, Lua and Perl all do this for example), and often that format can be serialised and loaded directly from disk.
There are implementations of scripting languages that compile to native code. For example, you could try javascript and Google's v8 engine, which (as far as I'm aware) compiles everything to native code before execution.
v8 is of course used in Chrome, which is a multi-process environment, so I would imagine it would work perfectly well in a multi-threaded environment (I can't claim personal experience of that though).
There are also JIT compilers for languages that are typically compiled to bytecode (for example, Psyco for python, and LuaJit for Lua). These are often not in sync with the latest version of the main language distribution though.
I think you want to check out Node.js.
It is a high performance multi threaded engine built on top of Google's V8 engine. It's extremely fast and built to be for scaling to huge levels.

dynamic code compilation

I'm working on a program that renders iterated fractal systems. I wanted to add the functionality where someone could define their own iteration process, and compile that code so that it would run efficiently.
I currently don't know how to do this and would like tips on what to read to learn how to do this.
The main program is written in C++ and I'm familiar with C++. In fact given most of the scenarios I know how to convert it to assembly code that would accomplish the goal, but I don't know how to take the extra step to convert it to machine code. If possible I'd like to dynamically compile the code like how I believe many game system emulators work.
If it is unclear what I'm asking, tell me so I can clarify.
Thanks!
Does the routine to be compiled dynamically need to be in any particular language. If the answer to that question is "Yes, it must be C++" you're probably out of luck. C++ is about the worst possible choice for online recompilation.
Is the dynamic portion of your application (the fractal iterator routine) a major CPU bottleneck? If you can afford using a language that isn't compiled, you can probably save yourself an awful lot of trouble. Lua and JavaScript are both heavily optimized interpreted languages that only run a few times slower than native, compiled code.
If you really need the dynamic functionality to be compiled to machine code, your best bet is probably going to be using clang/llvm. clang is the C/Objective-C front end being developed by Apple (and a few others) to make online, dynamic recompilation perform well. llvm is the backend clang uses to translate from a portable bytecode to native machine code. Be advised that clang does not currently support much of C++, since that's such a difficult language to get right.
Some CPU emulators treat the machine code as if it was byte code and they do a JIT compile, almost as if it was Java. This is very efficient, but it means that the developers need to write a version of the compiler for each CPU their emulator runs on and for each CPU emulated.
That usually means it only works on x86 and is annoying to anyone who would like to use something different.
They could also translate it to LLVM or Java byte code or .Net CIL and then compile it, which would also work.
In your case I am not sure that sort of thing is the best way to go. I think that I would do this by using dynamic libraries. Make a directory that is supposed to contain "plugins" and let the user compile their own. Make your program scan the directory and load each DLL or .so it finds.
Doing it this way means you spend less time writing code compilers and more time actually getting stuff done.
If you can write your dynamic extensions in C (not C++), you might find the Tiny C Compiler to be of use. It's available under the LGPL, it's compatible for Windows and Linux, and it's a small executable (or library) at ~100kb for the preprocessor, compiler, linker and assembler, all of which it does very fast. The downside to that, of course, is that it can't compare to the optimizations you can get with GCC. Another potential downside is that it's X86 only AFAIK.
If you did decide to write assembly, TCC can handle that -- the documentation says it supports a gas-like syntax, and it does support X86 opcodes.
TCC also fully supports ANSI C, and it's nearly fully compliant with C99.
That being said, you could either include TCC as an executable with your application or use libtcc (there's not too much documentation of libtcc online, but it's available in the source package). Either way, you can use tcc to generate dynamic or shared libraries, or executables. If you went the dynamic library route, you would just put in a Render (or whatever) function in it, and dlopen or LoadLibrary on it, and call Render to finally run the user-designed rendering. Alternatively, you could make a standalone executable and popen it, and do all your communication through the standalone's stdin and stdout.
Since you're generating pixels to be displayed on a screen, have you considered using HLSL with dynamic shader compile? That will give you access to SIMD hardware designed for exactly this sort of thing, as well as the dynamic compiler built right into DirectX.
LLVM should be able to do what you want to do. It allows you to form a description of the program you'd like to compile in an object-oriented manner, and then it can compile that program description into native machine code at runtime.
Nanojit is a pretty good example of what you want. It generates machine code from an intermediate langauge. It's C++, and it's small and cross-platform. I haven't used it very extensively, but I enjoyed toying around just for demos.
Spit the code to a file and compile it as a dynamically loaded library, then load it and call it.
Is there are reason why you can't use a GPU-based solutions? This seems to be screaming for one.