Clojure performance on JVM versus CLR

Clojure performance on JVM versus CLR - clojure

Are there are performance comparisons of Clojure on JVM versus CLR? Or maybe someone who has used both with performance-sensitive code can give some anecdotal comments?

The performance of Clojure JVM is better than that of Clojure CLR. I don't have explicit benchmarks to point to, but I have a lot of experience doing compilation and running tests in both environments and the difference is obvious.
There are several factors involved in the difference. Some are being worked on. Some are related to JVM vs CLR perf differences and hence beyond the means of the ClojureCLR developers to address.
(1) Compilation of Clojure code to the platform Intermediate Language.
At the most basic level, the IL generated is almost identical. However, design choices forced by some limitations of the Dynamic Language Runtime result in each function definition creating an extra class and function invocations to have an extra method call. Version 1.4 of ClojureCLR (coming soon) eliminates the use of the DLR for most code generation. (The DLR will still be used for CLR interop and polymorphic inline caching.) At this point, generated code will be substantially the same as the JVM version. Startup time has been reduced by 10% and simple benchmarks show 4-16% improvements over version 1.3. More details here.
(2) Startup time
Clojure JVM starts significantly faster than Clojure CLR. Most of this is traceable to the JVM being able to selectively load class files (versus the CLR loading entire assemblies) and differences in when JIT compilation occurs. However, if ClojureCLR is NGEN'd, startup times are really fast. More details here.
(3) JVM versus CLR performance
Some attention has been paid to making ClojureJVM work well with HotSpot compiler optimizations. I don't have explicit proof, but I'm guessing that HotSpot just does a better job on things like inlining in compiled Clojure code versus the CLR JITter. It is fair to say that no attention has been paid to how to make ClojureCLR take better advantage of the CLR JITter.
The release of ClojureCLR 1.4 will provide a good opportunity for some benchmarking.

I've not really used the CLR version so can't fully answer your question.
However it is worth noting that most of the optimisation / development effort so far has gone into the mainline JVM version of Clojure. As a result you can expect the JVM version of Clojure to perform considerably better at present in most situations.
Clojure on the JVM is already one of the fastest dynamically typed languages around - from the benchmarks game page Common Lisp is the only dynamically typed language which is (marginally) faster.
Over time I'd expect the Clojure JVM/CLR gap to narrow as both versions tend towards the performance of their host platforms. But right now, if performance is your key concern, I'd definitely recommend the JVM version (as well as performance, the JVM version is also likely to be better for overall maturity, library availability and cross platform support).

Related

Which is faster, Clojure or ClojureScript (and why)?

If I had to guess, I'm pretty sure the answer is Clojure, but I'm not sure why. Logically (to me) it seems like ClojureScript should be faster:
Both are "dynamic", but ClojureScript
Compiles to JavaScript, running on V8
V8 engine is arguably the fastest dynamic language engine there is
V8 is written in C
whereas Clojure:
Is also dynamic
Runs in JVM, which has no built-in dynamic support, so I'm thinking thus JVM has to do whatever V8 is doing too, to enable dynamic support
and Java is slower than C
So how could Clojure be faster than ClojureScript? Does "dynamic" mean something different when saying JavaScript is dynamic and Clojure is dynamic? What am I not seeing?
(Of course if ClojureScript is indeed faster, then is the above reasoning correct?)
I guess, what does Clojure compile to....is at least part of the question. I know the JVM part can't just be a plain interpreter (otherwise ClojureScript would be faster), but Clojure can't compile to regular bytecode, as there's no "dynamic" in the JVM. So what's the difference between how ClojureScript is compiled/executed and how Clojure is compiled/excecuted and how plain Java is compiled/executed, and the performance differences implied in each?

Actually, V8 is written in C++. However, does basically the same thing as the JVM, and JVM is written in C. V8 JITs Javascript code and executes the JIT'd code. Likewise the JVM JIT compiles (or hotspot compiles) bytecode (NOT Java) and executes that generated code.
Bytecode is not static, as Java is. In fact it can be quite dynamic. Java, on the other hand is mostly static, and it is not correct to conflate Java with bytecode. The java compiler transforms Java source code into bytecode, and the JVM executes the bytecode. For more information, I recommend you look at John Rose's blog (example). There's a lot of good information there. Also, try to look for talks by Cliff Click (like this one).
Likewise, Clojure code is directly compiled to bytecode, and the JVM then does the same process with that bytecode. Compiling Clojure is usually done at runtime, which is not the speediest process. Likewise the translation of Clojurescript into Javascript is not fast either. V8's translation of Javascript to executable form is obviously quite fast. Clojure can be ahead of time compiled to bytecode though, and that can eliminate a lot of startup overhead.
As you said, it's also not really correct to say that the JVM interprets bytecode. The 1.0 release did that more than 17 years ago!
Traditionally, there were two compilation modes. The first mode is a JIT (Just in Time) compiler. Where bytecode is translated directly to machine code. Java's JIT compiling executes fast, and it doesn't generate highly optimized code. It runs OK.
The second mode is called the hotspot compiler. The hotspot compiler is very sophisticated. It starts the program very quickly in interpreted mode, and it analyzes it as the program runs. As it detects hotspots (spots in the code that execute frequently), it will compile those. Whereas the JIT compiler has to be fast because nothing executes unless it's JIT'ed the hotspot compiler can afford to spend extra time to optimize the snot out of the code that it's compiling.
Additionally, it can go back and revisit that code later on and apply yet more optimizations to it if necessary and possible. This is the point where the hotspot compiler can start to beat compiled C/C++. Because it has runtime knowledge of the code, it can afford to apply optimizations that a static C/C++ compiler cannot do. For example, it can inline virtual functions.
Hotspot has one other feature, which to the best of my knowledge no other environment has, it can also deoptimize code if necessary. For example, if the code were continually taking a single branch, and that was optimized and the runtime conditions change forcing the code down the other (unoptimized) branch and performance suddenly becomes terrible. Hotspot can deoptimize that function and begin the analysis again to figure out how to make it run better.
A downside of hotspot is that it starts a bit slow. One change in the Java 7 JVM has been to combine the JIT compiler and the hotspot compiler. This mode is new, though, and it's not the default, but once it is initial startup should be good and then it can begin the advanced optimizations that the JVM is so good at.
Cheers!

This question is hard to answer precisely, without reference to a specific benchmark task (or even specific versions of Clojure or ClojureScript).
Having said that, in most situation I would expect Clojure to be somewhat faster. Reasons:
Clojure usually compiles down to static code, so it doesn't actually do any dynamic lookups at runtime. This is quite important: high performance code often produces bytecode that is very similar to statically typed Java. The question appears to be making the false assumption that a dynamic language has to do dynamic method lookups at runtime: this is not always the case (and usually isn't in Clojure)
The JVM JIT is very well engineered, and I believe it is currently still a bit better than the JavaScript JITs, despite how good V8 is.
If you need concurrency or need to take advantage of multiple cores then clearly there is no contest since JavaScript is single-threaded.....
The Clojure compiler is more mature than ClojureScript, and has had quite a lot of performance tuning work in recent years (including things like primitive support, protocols etc.)
Of course, it is possible to write fast or slow code in any language. This will make more of a difference than the fundamental difference between the language implementations.
And more fundamentally, your choice between Clojure and ClojureScript shouldn't be about performance in any case. Both offer compelling productivity advantages. The main deciding factor should be:
If you want to run on the web, use ClojureScript
If you want to run on the server in a JVM environnment, use Clojure

This is not so much an answer as an historical comment: Both the HotSpot VM and the V8 js engine can have their origins traced to the Self project at Sun Microsystems, which I think prototyped a lot of the technology that allows them to run as fast as they do. Something to consider when comparing them both. I would've posted this as a comment but the reputation system prevented me.

Microarchitectural profiling of C++ and assembly code on MIPS

As part of course project, I need to analyze a piece of C++ code for performance and find out which parts of the Computer Architecture (MIPS or x86) are mostly utilized while running the code and is possibly a bottleneck for the performance. I am looking at various Profilers for analyzing the performance and came across SimpleScalar which is a great tool but sadly only works with C code.
Since I am more familiar with MIPS architecture it would be great if there's a tool like SimpleScalar for simulating and profiling C++ code for MIPS. I am looking at the performance critical parts like branch, cache, instruction set, addressing modes etc. If not, mention of any tool which can do the similar kind of analysis for x86 architectures would be great as well.
(Just to clarify, I'm not looking for any old profiler, but for one that understands the CPU microarchitecture and knows what parts of the CPU are taken advantage of or underused.)

CACTI has detailed low-level simulation of cache.
SESC is a cycle accurate computer architecture simulator that supports MIPS.
SESC includes CACTI.

I doubt that what you want is possible. C++ is the language, but it still needs to be compiled to the target architecture. The optimisations (or the lack of them) will determine a lot of your performance criteria like cache use, etc. So I guess you need to look for machine level profilers (Hopefully they support the debug format of your compiler, so you see source code context).

My understanding is that SimpleScalar can simulate and profile MIPS machine code, no matter what the original language it was compiled from.
(The source-level debugger "DLite!" that comes with SimpleScalar may only support a few languages, but it sounds like you don't need to "debug" your code.)

Dalvik's effect on native c++ code performance?

I plan on using Necessitas to port Qt code to the Android platform. On a first sight I noticed despite being native code, everything still passes through the Dalvik VM.
My question is does this introduce overhead? Java is less efficient than native c++ to begin with and Dalvik is rather immature compared to vanilla Java, which is the cause of my concerns.

In the Android documentation you can find the following tip:
Native code isn't necessarily more efficient than Java. For one thing,
there's a cost associated with the Java-native transition, and the JIT
can't optimize across these boundaries. If you're allocating native
resources (memory on the native heap, file descriptors, or whatever),
it can be significantly more difficult to arrange timely collection of
these resources. You also need to compile your code for each
architecture you wish to run on (rather than rely on it having a JIT).
You may even have to compile multiple versions for what you consider
the same architecture: native code compiled for the ARM processor in
the G1 can't take full advantage of the ARM in the Nexus One, and code
compiled for the ARM in the Nexus One won't run on the ARM in the G1.
Of coarse, Dalvik code is slower than pure C/C++ optimized for the platform. But communication between native code and Java code happens through JNI which is the main source of the overhead.
So the answer for your question is yes, JNI introduces additional overhead. But if you want to port existing C/C++ code, ndk is the best choice in your case.

Using C++ in an embedded environment

Today I got into a very interesting conversation with a coworker, of which one subject got me thinking and googling this evening. Using C++ (as opposed to C) in an embedded environment. Looking around, there seems to be some good trades for and against the features C++ provides, but others Meyers clearly support it. So, I was wondering who would be able to shed some light on this topic and what the general consensus of the community was.

C++ for embedded platforms is perfectly fine - as long as you treat it as a better C. I love the fact that the language is slightly more structured. You can still do all the things that you want to do with C. Just remember to stick to an embedded C library like Newlib or uClibc.
I particularly like the abstraction that we can build using C++, particularly for I/O devices. So, we can have a class for UART and a class for GPIO and what nots. It is cleaner than having a bunch of functions (IMHO).

The fear of C++ among embedded developers is largely a thing of the past, when C++ compilers were not as good as C compilers (optimizations and code quality wise).
This applies especially to modern platforms with 32 bit architectures.
But, C is certainly still the preferred choice for more confined environments (as is assembler for 8 bit or 4 bit targets).
So, it really boils down to the resources your target platform provides, and how much of these resources you are likely to actually require, i.e. if you can afford the 'luxury' of doing embedded development in C++ (or even Java for that matter), because you know that you'll hardly have any issues regarding memory or CPU constraints.
Nowadays, many modern embedded platforms (think gaming consoles, mobile phones, PDAs etc), have really become very capable targets, with RISC architectures, several MB of RAM, and 3D hardware acceleration.
It would be a poor decision, to program such platforms using just C or even assembler out of uninformed performance considerations, on the other hand programming a 16 bit PIC in C++ would probably also be a controversial decision.
So, it's really a matter of asking yourself how much of the power, you'll actually need and how much you can afford to sacrifice, in order to improve the development experience (high level language, faster development, less tedious/redundant tasks).

It sort of depends on the particular nature of your embedded system and which features of C++ you use. The language itself doesn't necessarily generate bulkier code than C.
For example, if memory is your tightest constraint, you can just use C++ like "C with classes" -- that is, only using direct member functions, disabling RTTI, and not having any virtual functions or templates. That will fit in pretty much the same space as the equivalent C code, since you've no type information, vtables, or redundant functions to clutter things up.
I've found that templates are the biggest thing to avoid when memory is really tight, since you get one copy of each template function for each type it's specialized on, and that can rapidly bloat code segment.
In the console video games industry (which is sort of the beefy end of the embedded world) C++ is king. Our constraints are hard limits on memory (512mb on current generation) and realtime performance. Generally virtual functions and templates are used, but not exceptions, since they bloat the stack and are too perf-costly. In fact, one major manufacturer's compiler doesn't even support exceptions at all.

In my previous company all embedded code was written in a small subset of C code due to security (SIL-2) and memory reasons. By introducing a richer language like C++ in that particular scenario would have maybe cause more trouble than benefits.
In all due respect to C++ (which is a language I really love) but I think C - in our particular scenario - was the better choice.
I bet in some cases C++ is just fine to use for embedded applications but it really depends on the application - there is a difference if your program is controlling a nuclear plant or administrating an address book on your cell phone.

I don't know about "general consensus", only the company I work for (which does a lot of development for mobile phones, car navigation systems, DPFs, etc.).
The main drawback I've encountered to using C++ on embedded platforms as opposed to C is that it isn't quite as portable - there are many more cases of compilers that don't adhere to the standard which can cause problems if you need to build your code with more than 1 compiler or outright have bugs in the implementation. Then there are environments where C++ code simply won't run - BREW's issues with relocatable code and its "native OOP" don't play so well with "regular" C++ classes and inheritance.
In the end, though, if you're only targeting 1 platform, I'd say use whatever you think is "better" (faster, less bugs, better design) for your development - in most cases the issues can be worked around quite easily.

Depends what kind of embedded development you are doing. I've done embedded development with both C++, C, and Assembly on various platforms, you can even use Java to write applications on smart phones.
For instance on a smart phone like device that's running Windows CE 5, almost all of the code is C++, including in the operating system. Only small bits are written in C or assembly.
On the other hand I've written code for an MSP430 microcontroller, which was in C, and I probably would have done that in C++ had the compiler been more reliable and standards compliant.
Also I seem to recall a university lecturer of mine talking about writing embedded code in Forth or something. So really any language can do.

Now a days it will all boil down to the C++ runtime support of the platform. You're likely to find a way to compile C++ code down to almost any embedded platform with GCC, but if you can't find a suitable C++ runtime for the platform your efforts will be futile, unless you write your own C++ runtime.

One of the few things I tend to agree with Linus is his opinion about C++ http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918
Besides this, if you really really want to use C++ you might want to have a look at http://www.caravan.net/ec2plus/ which describes Embedded C++, or better to say you should not use in C++ for embedded systems.

The big thing keeping us with using C++ for a long time was the VxWorks support for it, which truly sucked. That supposidly has gotten better on VxWorks 6 (yes, it's been out a while... good 'ole vendor lock-in and lack of company vision has kept us stuck on VxWorks 5.5).
So for us it's mostly a question of the environment. After that, C++ can obviously be just as good as C... it's a matter of people understanding what their tool does and how to use it. C++ may make it easier to write incredibly inefficient code, but that doesn't mean we have to succomb to it.

I am currently fighting a problem with exceptions in an embedded Linux application. We are trying to port software written for a different platform that seemed to support exceptions well, but the new compiler tools (a port of gcc) reports errors when creating the eh_frame. I was against using exceptions for this tool, but the developer reassured me that modern compilers would support it well.
My opinion is that there are some advantages to C++, but I would stay away from exceptions and the standard template library. We haven't had problems using virtual functions.

C++ is suitable for microcontrollers and devices without an OS. You just have to know the architecture of the system and be conscious of time and space constrains, especially when doing mission critical programming.
With C++ you can do abstraction which often leads to an increased footprint in the code. You do not want this when programming for a resource-limited machine such as an 8-bit MCU.
Generally, avoid:
Dynamic memory allocation because it represents uncertainty in timing
Overloading
RTTI because the memory cost is large
Exceptions because of the execution speed lowering
Be cautious with virtual functions as they have a resource cost of a vtable per class and one pointer to the vtable per object. Also, use const in place of #define.
As you move up to 16 and 32-bit MCUs, with 10s or 100s of MB RAM, heavier features like the ones mentioned above may be used.
So to round up, C++ is useful for embedded systems. A main benefit is that OOP can be useful when you want to abstract aspects of the microcontroller, for example UART or state machines. But you may want to avoid certain features all of the time and some of the features some of the time, depending on the target you are programming for.

What is a good scripting language to integrate into high-performance applications?

I'm a game's developer and am currently in the processing of writing a cross-platform, multi-threaded engine for our company. Arguably, one of the most powerful tools in a game engine is its scripting system, hence I'm on the hunt for a new scripting language to integrate into our engine (currently using a relatively basic in-house engine).
Key features for the desired scripting system (in order of importance) are:
Performance - MUST be fast to call & update scripts
Cross platform - Needs to be relatively easy to port to multiple platforms (don't mind a bit of work, but should only take a few days to port to each platform)
Offline compilation - Being able to pre-parse the script code offline is almost essential (helps with file sizes and load times)
Ability to integrate well with c++ - Should be able to support OO code within the language, and integrate this functionality with c++
Multi-threaded - not required, but desired. Would be best to be able to run separate instances of it on multiple threads that don't interfere with each other (i.e. no globals within the underlying code that need to be altered while running). Critical Section and Mutex based solutions need not apply.
I've so far had experience integrating/using Lua, Squirrel (OO language, based on Lua) and have written an ActionScript 2 virtual machine.
So, what scripting system do you recommend that fits the above criteria? (And if possible, could you also post or link to any comparisons to other scripting languages that you may have)
Thanks,
Grant

Lua has the advantage of being time-tested by a number of big-name video game developers and a good base of knowledgeable developers thanks to Blizzard-Activision's adoption of it as the primary platform for developing World of Warcraft add-ins.

Lua is a very good match for your needs. I'll take them in the same order.
Lua is one of the fastest scripting languages. It's fast to compile and fast to run.
Lua compiles on any platform with an ANSI C compiler, which afaik includes all gaming platforms.
Lua can be pre-compiled, but as a very dynamic languages most errors are only detectable at runtime. Also precompiled code (as bytecode) is often larger in terms of size than source code.
There are many Lua/C++ binding tools.
It doesn't support multi-threading (you cannot access a single instance of the interpreter from multiple threads), but you can have several instances of the interpreter, one per thread, or even one per game object.

Lua have been used in video-game industry for years. Lightweight and efficient.
That being said, ChaiScript and Falcon are good candidates matching your needs and with higher level language than Lua but with less history and community support.

Lua
Boost Python
SWIG

We've had good luck with Squirrel so far. Lua is so popular it's on its way to becoming a standard.
I recommend you worry more about memory than speed. Most scripting languages are "fast enough" and if they get slow you can always push some of that functionality back down into C++. Many of them burn through lots of memory, though, and on a console memory is an even more scarce resource than CPU time. Unbounded memory consumption will crash you eventually, and if you have to allocate 4MB just for the interpreter, that's like having to throw 30 textures out the window to make room.

Lua, and then LuaJIT for extra blaziness!
just don't expect too much from automatic C++ binding libraries, most are slow and restrictive. better do your own binding for your own objects.
as for concurrency, either LuaLanes, or roll your own. if your C++ program is already multithreaded, just call separate LuaStates from each thread, and use your own C++ shared structures as communications channels if needed.
as you might already know, the most often repeated answer in Lua is 'roll your own', and it's often the best advice! except when it's about bindings to common C/C++ libraries, in that case it's quite probable there's already one.

If you haven't looked at it yet I would suggest you check out Angelscript.
I have successfully used it in a cross platform environment (Windows and Linux with only a recompile) and it is designed to integrate well with C++ (both objects and code).
It is lightweight and supports multi-threading (in the sense that the question was asked), performs well and compiles to byte code which could be done in advance.

Start with Python.
If you can prove that you need more speed, then look at Stackless Python. That's what EVE Online uses for their game.

JavaScript may be a reasonable option, because of the mountains of effort that have gone into optimizing the various implementations for use in web-browsers.

These come to mind:
Lua
Python with boost::python
MzScheme or Guile
Ruby with SWIG

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js