Tiny VM based on immutable data structures?

Tiny VM based on immutable data structures? - clojure

Is there any "tiny" VM (for any programming language) where the main data structures visible to the user (lists, arrays, maps, sets, etc.) are immutable as in Clojure or Haskell?
By "tiny" I mean a VM where implementation simplicity, brevity and portability are key points: think Lua or TinyScheme.

I'm not sure how it aligns with your "key points", but you might take a look at Pixie. Pixie implements a VM in RPython. One of its claims is a small footprint of just over 10MB for the compiled VM + standard libs. The language is a lisp based (loosely) on Clojure. It appears to maintain Clojure's policy of immutable by default, and definitely has implementations of Clojure's persistent datatypes.

Owl Lisp
A "purely functional Scheme". The VM is 1600 lines of C.
Owl Lisp is a purely functional dialect of Scheme. It is based on the
applicable subset of R7RS standard, extending it mainly with threads
and data structures necessary for purely functional operation. Owl can
be used on most UNIX-like systems, such as Linux, BSDs and OS X.
Programs are typically compiled via C to standalone binaries, so Owl
isn't needed to run programs written in it.
Owl project originally got started both as an attempt to extend R5RS
Scheme with some necessary features, such as threads and modules, and
as an experiment on how being purely functional influences the runtime
and use of an applicative order purely functional language. While
things have been added to Scheme, Owl tries to keep the core language
as simple as possible.
Implementationwise the goal was to get a small portable system which
could be used to ship programs easily. This is currently accomplished
by using a small register-based virtual machine, which can be extended
with program-specific instructions to reduce interpretive overhead.
https://haltp.org/n/owl
https://github.com/aoh/owl-lisp
ClojureC
Compiler for the Clojure programming language that targets C
as a backend. It is based on ClojureScript ... Before you can run
anything make sure you have GLib 2 and the Boehm-Demers-Weiser garbage
collector installed.
https://github.com/schani/clojurec
TinyClojure
TinyClojure is a project to build a small, easily embeddable version
of Clojure/ClojureScript in portable C++. In many ways it is my
attempt to create a Clojure equivalent of TinyScheme.
...
ClojureC is good, but the build process is complex, and
there are external library dependencies ... The focus with TinyClojure's development is to make it the easiest way
to embed Clojure within any application. Tiny Clojure consists of one
header file, one source file, no external dependencies, and the
extension and embedding interface is as simple as it is possible to
be.
https://github.com/WillDetlor/TinyClojure/

Related

Use C/C++ in Apache-Flink

My team and I are developing an application that makes use of Flink.
The data will be processed using a computationally-heavy numerical algorithm.
In order to optimize it as much as possible, I would like to write this algorithm in C/C++ rather than in Java.
The question is: is it possible to use C/C++ code within Flink? Perhaps by wrapping it into a Java library?

I've never tested this case in particular. In general, you can always use native code from Java using JNI, the Java Native Interface.
The idea would be to have a Java facade expose your native code and use these methods in your computation graph defined with Flink in Java (or other JVM languages, like Scala). You would have to make both Java and native libraries available on all involved nodes to make this work. If you have an Hadoop cluster you can leverage YARN to ship files along with your job (docs here, see --yarn-ship CLI option).
I would suggest you test this incrementally, with a very small native function exposed. Also, don't underestimate Java's capabilities in terms of performance: with some well thought programming and leveraging JIT and other runtime optimizations, long running processes can enjoy even better performances than similar native code with unmanaged memory.
Keep in mind that this of course resorting to native code will mean restricting the portability of your code to the platforms for which you'll compile your libraries.

Building a simple DSL in Haskell with C++ interop

I'm designing a simple interpreted language for testing real-time embedded systems. The control flow is severely restricted to provide strong static guarantees on what the scripts will do + how long they will run. For example, you can only branch on constant conditionals or loop over fixed ranges.
There is a large existing codebase in C++ with relevant models and IO libs, so this language must be able to call into C++. The systems under test have hard timing requirements, so we can't tolerate much jitter in the test framework. Our past solution was a custom DSL embedded in the C++ runtime, but we ended up re-inventing too many wheels (parser, linter, interactive interpreter, etc..) to achieve the static guarantees we need.
Haskell's facilities for crafting embedded DSLs with these guarantees are extremely appealing to me, but I'm stuck in determining how to embed it into the soft real-time C++ runtime. Any ideas? Pointers to any libraries / existing projects would be greatly appreciated!

Sounds like the path of least resistance would be an EDSL that generates C++. This way, you don't have to worry about the potential mismatch between soft real-time and the GHC RTS.
You might look at how other EDSLs that generate PLs are implemented:
HJScript uses a free monad approach to embedding JS.
JMacro uses more of an external DSL approach but embedded via TH. Wouldn't be my choice.
Instead of generating strings of C++ code, it's nice to have a data structure. Unfortunately, there doesn't seem to be a package available for C++. However, you could take a look at language-c — perhaps extend that or build your own. You might even consider generating C and using the C to C++ interop provided by those languages.
I'd probably dissuade you from looking at the design of Cryptol or Cogent as these are fully-fledged programming languages (that you have indicated you are inclined to steer away from).

Rules Engine: Best fit for an embedded system

I have been doing some research on the rules engines that would be more appropriate run on a embedded system. The system will collect information from sensors and according to that information, make specific C/C++ calls. An example would be:
IF RainSensor.value > RainSensor.threshold
THEN call( GarageWindow::close())
Being GarageWindow a C++ class, living in the binary that links to the Rules Engine library.
I cannot make assumptions on the capabilities of said embedded system. The requirements for it are:
Minimum footprint.
Portable
Able to make function calls (make C/C++ calls from its RHS).
Should accept new rules at runtime.
I would need to give alternatives according to its capabilities (to be defined in the future), the assumptions are:
1) The embedded system supports nothing(no JVM, no python, etc):
CLIPS as C library or clipsmm as C++ library. Both usable in commercial applications (GPL for clipsmm).
Advantages: Open Source. Very well tested/documented. Portable and can run on low memory footprint. It’s possible to call C or C++ functions from its RHS section. Most likely the Rules Engine will need to interact with a C or C++ software.
Disadvantages: It is not thread safe. It supports only forward chaining.
2) The system supports python:
PyClips:
Python interface to CLIPS. Functionality remains the same as in the previous case. Using this would only benefit if python calls need to be made in the RHS section. Any advantage/disadvantage I miss?
3) System supports JVM:
Jess:
Advantages: Nice integration with Java objects. CLIPS-like scripting language. Forward and backward chaining. Automatic listening to Java objects to modify slots.
Disadvantages: Licensed. Can only define new classes at compile time.
Drools:
Advantages: Open source. Documented. Java integration. Forward and backward chaining.
Disadvantages: It is more designed to work on the web (that's my impression).
What would be the advantage of Drools vs Jess in an embedded environment?
Both can only add new classes at compile time. CLIPS can at runtime. On the other hand,if the C++ code that updates clips instances is old, any new class created directly in CLIPS will not be matched to any other class in the calling C++ code. Therefore, the disadvantage of recompiling of the Java options is not such.
Is there any other appropriate engine for embedded systems that I am totally missing?

Implementations of Clojure for other platforms?

Are there any implementations of Clojure being built for other virtual machines (such as .Net, Python, Ruby, Lua), or is it too closely tied to Java and the JVM? Does it make sense to build a Clojure for other platforms?

There are currently three implementations of Clojure that I know of:
ClojureCLR, an implementation of Clojure for the CLI,
ClojureScript, an implementation of (a subset of (a variant of)) Clojure for ECMAScript and
a Clojure implementation for the Java platform, confusingly also called Clojure.
In fact, the name Clojure was specifically chosen by Rich Hickey because it contained both the letters CLR as well as the letter J.
I've heard rumours of implementations for the Objective-C/Cocoa runtime, LLVM and the Rubinius VM, but I have no idea whether or not those actually exist.

" or is it too closely tied to Java and the JVM?
Does it make sense to build a Clojure for other platforms?"
One of the Clojure design philosophies is embrace the host platform. Clojure on the JVM embraces the JVM and gives direct access to classes, numbers etc. interop is both ways with out glue.
ClojureScript embraces JavaScript(ECMAScript) in exactly the same way, giving direct access to Objects, numbers, etc. the same for the .NET target.
It is tempting, but not always successful, to make 'cross platform' languages that run the exact same source code on multiple platforms. Thus far Clojure has avoided this temptation and strives to remain close to the host.

There exits at least a ClojureCLR project by Rich Hickey himself.
This project is a native implementation of Clojure on the Common Language Runtime (CLR),
the execution engine of Microsoft's .Net Framework.
ClojureCLR is programmed in C# (and Clojure itself) and makes use of Microsoft's
Dynamic Language Runtime (DLR).

I'm not sure that Python and Ruby ports make sense, those are languages with multiple virtual machines / implementations. If you want to have native interop between Clojure and Python or Ruby you could use Jython or JRuby and stay on the JVM.

What is a good scripting language to integrate into high-performance applications?

I'm a game's developer and am currently in the processing of writing a cross-platform, multi-threaded engine for our company. Arguably, one of the most powerful tools in a game engine is its scripting system, hence I'm on the hunt for a new scripting language to integrate into our engine (currently using a relatively basic in-house engine).
Key features for the desired scripting system (in order of importance) are:
Performance - MUST be fast to call & update scripts
Cross platform - Needs to be relatively easy to port to multiple platforms (don't mind a bit of work, but should only take a few days to port to each platform)
Offline compilation - Being able to pre-parse the script code offline is almost essential (helps with file sizes and load times)
Ability to integrate well with c++ - Should be able to support OO code within the language, and integrate this functionality with c++
Multi-threaded - not required, but desired. Would be best to be able to run separate instances of it on multiple threads that don't interfere with each other (i.e. no globals within the underlying code that need to be altered while running). Critical Section and Mutex based solutions need not apply.
I've so far had experience integrating/using Lua, Squirrel (OO language, based on Lua) and have written an ActionScript 2 virtual machine.
So, what scripting system do you recommend that fits the above criteria? (And if possible, could you also post or link to any comparisons to other scripting languages that you may have)
Thanks,
Grant

Lua has the advantage of being time-tested by a number of big-name video game developers and a good base of knowledgeable developers thanks to Blizzard-Activision's adoption of it as the primary platform for developing World of Warcraft add-ins.

Lua is a very good match for your needs. I'll take them in the same order.
Lua is one of the fastest scripting languages. It's fast to compile and fast to run.
Lua compiles on any platform with an ANSI C compiler, which afaik includes all gaming platforms.
Lua can be pre-compiled, but as a very dynamic languages most errors are only detectable at runtime. Also precompiled code (as bytecode) is often larger in terms of size than source code.
There are many Lua/C++ binding tools.
It doesn't support multi-threading (you cannot access a single instance of the interpreter from multiple threads), but you can have several instances of the interpreter, one per thread, or even one per game object.

Lua have been used in video-game industry for years. Lightweight and efficient.
That being said, ChaiScript and Falcon are good candidates matching your needs and with higher level language than Lua but with less history and community support.

Lua
Boost Python
SWIG

We've had good luck with Squirrel so far. Lua is so popular it's on its way to becoming a standard.
I recommend you worry more about memory than speed. Most scripting languages are "fast enough" and if they get slow you can always push some of that functionality back down into C++. Many of them burn through lots of memory, though, and on a console memory is an even more scarce resource than CPU time. Unbounded memory consumption will crash you eventually, and if you have to allocate 4MB just for the interpreter, that's like having to throw 30 textures out the window to make room.

Lua, and then LuaJIT for extra blaziness!
just don't expect too much from automatic C++ binding libraries, most are slow and restrictive. better do your own binding for your own objects.
as for concurrency, either LuaLanes, or roll your own. if your C++ program is already multithreaded, just call separate LuaStates from each thread, and use your own C++ shared structures as communications channels if needed.
as you might already know, the most often repeated answer in Lua is 'roll your own', and it's often the best advice! except when it's about bindings to common C/C++ libraries, in that case it's quite probable there's already one.

If you haven't looked at it yet I would suggest you check out Angelscript.
I have successfully used it in a cross platform environment (Windows and Linux with only a recompile) and it is designed to integrate well with C++ (both objects and code).
It is lightweight and supports multi-threading (in the sense that the question was asked), performs well and compiles to byte code which could be done in advance.

Start with Python.
If you can prove that you need more speed, then look at Stackless Python. That's what EVE Online uses for their game.

JavaScript may be a reasonable option, because of the mountains of effort that have gone into optimizing the various implementations for use in web-browsers.

These come to mind:
Lua
Python with boost::python
MzScheme or Guile
Ruby with SWIG

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js