Mixing Haskell and C++ - c++

If you had the possibility of having an application that would use both Haskell and C++.
What layers would you let Haskell-managed and what layers would you let C++-managed ?
Has any one ever done such an association, (surely) ?
(the Haskell site tells it's really easy because Haskell has a mode where it can be compiled in C by gcc)
At first I think I would keep all I/O operations in the C++ layers. As well as GUI management.
It is pretty vague a question, but as I am planning to learn Haskell, I was thinking about delegating some work to Haskell-code (I learn in actually coding), and I want to choose some part where I will see Haskell benefits.

The benefit of Haskell is the powerful abstractions it allows you to use. You're not thinking in terms of ones and zeros and addresses and registers but computations and type properties and continuations.
The benefit of C++ is how tightly you can optimize it when necessary. You aren't thinking about high-minded monads, arrows, partial application, and composing pure functions: with C++, you can get right down to the bare metal!
There's tension between these two statements. In his paper “Structured Programming with go to statements,” Donald Knuth wrote
I have felt for a long time that a talent for programming consists largely of the ability to switch readily from microscopic to macroscopic views of things, i.e., to change levels of abstraction fluently.
Knowing how to use Haskell and C++ but also how and when to combine them well will knock down all sorts of problems.
The last big project I wrote that used FFI involved using an in-house radar modeling library written in C. Reimplementing it would have been silly, and expressing the high-level logic of the rest of the application would have been a pain. I kept the “brains” of it in Haskell and called the C library when I needed it.
You're wanting to do this as an exercise, so I'd recommend the same approach: write the smarts in Haskell. Shackling Haskell as a slave to C++ will probably end up frustrating you or making you feel as though you wasted your time. Use each language where its strengths lie.

Here is how I see things:
Functional languages excel at transforming things. Whenever you write programs which take an input and map/filter/reduce it, use functional constructs. Wonderful real world examples where Haskell should excel are given by web applications (you basically transform things stored in a database to web pages).
Procedural languages (OOP languages are procedural) excel at side effects and communication between objects. They are cumbersome to use to transform data, but whenever you want to do system programming or (bidirectional) interaction with humans (user interfaces of any kind, including client-side web programming), they do the job cleanly.
However, some may argue that user interfaces should have a functional description, I answer that well established frameworks are easy enough to use with OOP languages that one should use them this way. After all, it is natural to think of UI components in terms of objects and communication between objects.
IO is only a tool: whenever you transform input to output, Haskell (or whatever FP language) should do the IO. Whenever you speak to a human, C++ (or whatever OOP language) should do the IO.
Don't care about speed. When you use Haskell for the right job, it's efficient. When you use C++ or Python for the right job, it's efficient.
Therefore, let's say I'm fluent in Haskell, C, C++ and Python, here's how I write applications:
If my application's main role is to transform data, I write it in Haskell, with possibly some low-level parts written in C (which may, in turn, call some high-tech low level parts written in C++, but I'd stick with C as an interface for portability reasons).
If my application's main role is to interact with a user, I write it in Python (PyQt for instance), and let Python call performance-critical routines written in C++ (boost::python is pretty good as a binding generator). I may also have to call subroutines which transform or fetch data, which will be written in Haskell.
If I have to write a part of an application in Haskell, I segregate it into a C-callable API.
Sometimes, a Haskell application which reads things on stdin and write back on stdout is useful as a submodule (that you call with fork/exec, or whatever on your platform). Sometimes, a shell script is the right wrapper for such applications.

This answer is more a story than a comprehensive answer, but I used a mix of Haskell, Python and C++ for my dissertation in computational linguistics, as well as several C and Java tools that I didn't write. I found it simplest to run everything as a separate process, using Python as glue code to start the Haskell, C++ and Java programs.
The C++ was a fairly simple, tight loop that counted feature occurrences. Basically all it did was math and simple I/O. I actually controlled options by having the Python glue code write out a header full of #defines and recompiling. Kind of hacky, but it worked.
The Haskell was all the intermediate processing: taking the complex output from the various C and Java parsers that I used, filtering extraneous data, and transforming it the simple format the C++ code expected. Then I took the C++ output and transformed it into LaTeX markup (among other formats).
This is an area that you would expect Python to be strong, but I found that Haskell makes manipulation of complex structures easier; Python is probably better for simple line-to-line transformations, but I was slicing and dicing parse trees and I found that I forgot the input and output types when I wrote code in Python.
Since I was using Haskell a lot like a more-structured scripting language, I ended up writing a few file I/O utilities, but beyond that, the built in libraries for tree and list manipulation sufficed.
In summary, if you have a problem like mine, I would suggest C++ for the memory-constrained, speed-critical part, Haskell for the high-level transformations, and Python to run it all.

I have never mixed both languages but your approach feels a little upside down to me.
Haskell is more apt at high-level operations while C++ can be optimized and can be most beneficial for tight loops and other performance critical code.
One of the largest benefits of Haskell is the encapsulation of IO into monads. As long as this IO isn't time critical I don't see any reason to do it in C++.
For the GUI part you are probably right. There is a plethora of Haskell GUI libraries but C++ has powerful tools such as QtCreator which simplify the tedious tasks a lot.

Related

Cleanest data structure to use when interpreting data from neatly-structured user commands (in C++) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to write a simple in-house program that parses user commands written in a language of our team's own invention (but based closely on another program we are already familiar with). The command parser that I am working on now will simply be the UI through which the user can run the other algorithms I have already written. (Those other algorithms, by the way, are used to generate the input files for a molecular dynamic simulation package called LAMMPS.) The only thing I really have left to do is just write this UI, but as it turns out, writing your own scripting language is almost an intractable challenge for a non software engineer to tackle on his own.
According to the answers I received, what I am try to make would be considered a Domain Specific Language, and it is not advisable to try to make one's own DSL due to the enormous amount of work required to make it useful and bug-free.
The best option then would actually be to use an existing scripting language like Lua or Python, and embed it in the program.
To do this, I will most likely use Lua because it seems most fitting for our needs. So at this point, the rest of this question is no longer relevant since the answer would be: "Don't do it yourself." But I'm still going to keep part of it here for other users to be able read and learn from the wonderful answers below.
Thanks again to everyone who replied!
Old Question:
I would like to write a program that parses a user text input and then
runs a function corresponding to that input. To do this I would need
to parse the string for relevant keywords. I believe there will be
less than 15 keywords when I'm done, so ideally I'd like this code
to be simple and short.
The problem is that I am currently using if-statements to parse the
strings. This is an extremely inconvenient way to parse commands
because even for a short 3 word commands the code explodes into nested-ifs
3 layers deep. So longer 8+ word sentences will become nested-ifs more than
8 layers deep.
This kind of programing approach quickly becomes unmanageable, especially
when I need to make any significant changes to a command.
My question is whether or not there exists a data structure in C++ that
can help me better manage my giant nested-ifs, or if anyone could suggest
a better way to parse a string for lots of different data types (i.e.
substings, ints, and floats) and output an error message when the expected
type is not found?
Here is an example of a short user session to show the kinds of commands
I would like to interpret:
load "Basis.Silicon" as material 1
add material 1 to layer 1
rotate layer 1 about x-axis by 45 degrees
translate layer 1 in x-axis by 10 nm
generate crystal
These commands are based on an already-existing program that our team
uses, but unfortunately the source code for this program has never been
publicly released so I am left guessing as to how it was actually
implemented.
One final note, unlike natural language processors, I know exactly what
the format of each line will be. So my issue isn't so much how to interpret
the text, but rather how to code the logic in a concise and manageable way.
Thanks everyone!
Your question is not clear. And your goals are more difficult than what you believe.
Either you consider that you want to somehow process human language sentences (e.g. in English). Then you want to study natural language processing, and you can find some libraries related to that field.
Or you consider that you want to interpret some formal programming or scripting language. Then you want to study interpreters and compilers. BTW, in that case, you might just embed an existing interpreter (like Lua, Guile, Python, etc....) in your program.
You could also think in terms of expert systems with a knowledge base made of rules (this approach could be viewed as in the middle between NLP and scripting language) You'll then need some inference engine (perhaps CLIPS). See also J.Pitrat's blog.
Notice that even coding a simple interpreter is more difficult than you believe. You absolutely need to represent abstract syntax trees, which you construct from textual input with a parsing phase.
BTW, All of NLP, expert systems, and interpreter design and implementation are difficult fields. You could get a PhD in all 3 fields (but you have to choose which).
If you go the embedded interpreter way: study the interpreters I mentioned (Guile, Lua, Python, Neko, etc...) and choose which one you want, to embed.
If for whatever reason, you want to make an interpreter from scratch: Learn several programming languages first (including scripting languages like Ruby, Python, Ocaml, Scheme, Lua, Neko, ...). Read books on Programming Language Pragmatics (by M.Scott) and Lisp In Small Pieces (by Queinnec). Read also text books on compilation and parsing, and on Garbage Collection and formal (e.g. denotational) semantics. All this may need a dozen years of work.
Notice that by experience embedding a software in an interpreter is a very structuring design. If you did not thought of that at the beginning you probably need to redesign and refactor a lot your existing application. For instance, when embedding a software in an interpreter, you cannot afford that bad input crashes the program. So error handling and memory management (interfacing to the GC of the interpreter) is challenging and gives new constraints. Hence you'll need to re-think your application.
If all this is new (and even if you don't choose e.g. Guile as the embedding interpreter): learn and practice a bit of Scheme -e.g. with Guile or PltScheme- (e.g. reading SICP), read a little bit about λ-calculus and closures, then read Queinnec's Lisp In Small Pieces book. Remember the halting problem (which is partly why interpreters are difficult to code).
BTW the syntax you are proposing (e.g. rotate mat 1 by x 90) is not very readable and looks COBOL-like. If possible, have a language which looks familiar to existing ones. Make it easy to read !
Start by reading all the wikipages I am referencing here.
FWIW, I am the main author of MELT, a domain specific language (inspired a lot by Scheme) to extend the GCC compiler. Some of the papers / documentations I wrote might inspire you (and contain valuable references).
Addenda (after question was reformulated)
You seems to invent some formal syntax like
add material 1 to layer 1
rotate layer 1 about x-axis by 90 degrees
translate layer 1 in x-axis by 10 inches
I can't guess what kind of language is it? Are you implementing a 3D printer? If yes, you should stick to some existing standard formal language in that domain.
I believe that such a COBOL-like syntax is really wrong. The point is that it is too verbose, and that you are wishing to implement some domain specific language. I find your example very bad-looking.
Is that syntax your invention, or is there some document specifying (and many thousands already existing lines coded in) your domain specific language. If you are just inventing it, please reconsider the syntax and the semantics.
First, you need to specify on paper the full syntax and semantics of your DSL.
Is your DSL Turing complete? (I guess that yes, because Turing completeness is reached very quickly - e.g. with variables and loops....). If yes, you are inventing a scripting language. Please don't invent scripting language without knowing several programming & scripting languages (then read Programming Language Pragmatics...). The point is that, if your scripting language will become successful, advanced users will soon or later write important programs in it (e.g. many thousand lines). Then, these advanced users will be programmers. In that case, it is very important (for social & economic reasons) to have a DSL well founded and looking familiar (if possible, an extension of some existing scripting language).
If your DSL already exists, stick to its specification on paper. If that specification is not good enough, improve it with formalization (e.g. by writing some BNF syntax, and some formal (e.g. denotational) semantics for it). Publish and discuss that formalization with existing users.
Several industries got some ad-hoc DSLs which became widely used but was ill designed
(e.g., in the French nuclear industry, the Gibiane DSL designed in the 1970s by nuclear physicists, not computer scientists; the US Boeing corporation is also rumored to have made similar mistakes). Then, maintaining and improving the many hundred thousands lines of DSL scripts is becoming a nightmare (and may means losing millions of dollars or euros). So you better stick to some existing scripting language. The advantages are that there exist some culture on it (e.g. you can find dozens of books on Python or Lua, and many trained engineers familiar with them), that the interpreter is widely used and tested, that the community working on them is improving the interpreters, so it has quite few uncorrected bugs.
You should not attempt to design and implement your own DSL if you are not a trained computer scientist. Stick to some existing scripting language (of course their syntax is not like you want it to be), and leverage on existing implementations and experiment.
As a counter-example, J.Ousterhout has invented the widely used Tcl scripting language, with the claim that scripts are always small (e.g. hundreds of line only) and won't grow to big code base; unfortunately, some of them did, and Tcl is known as a bad language to code many dozens of thousands of lines (even if Tcl is an easy and convenient language for tiny scripts). The moral of the story is that if a (turing complete) scripting language is becoming successful, some "crazy" advanced user will code hundred of thousands of script code. So you need that scripting language to be well designed from the start. Hence, you should adopt and adapt a good existing scripting language (and avoid inventing an unfamiliar syntax without having a good knowledge of several existing scripting languages)
later additions
PS: my criticism of Tcl is not entirely subjective: the point is that Tcl was designed for small scripts in mind (read J.Ousterhout's first papers about Tcl), but my point is that when you offer a Turing-complete scripting language, some "crazy" user will eventually write huge scripts for it. Hence, you need to anticipate such "crazy" usage by offering a scripting language which "scales up" to big scripts, so is built according to software engineering practices for large software code base.
NB. Lua is probably a good choice as a language to embed. It is small, has a nice implementation, is well documented, and has good performance. But be careful about memory management issues (and this advice holds for any scripting language).
EDIT: To be more clear, I would like to have a short list of key words
(<15). The order/presence of which would determine which function will
be run.
You can build a small ruleset engine (e.g. something that processes lists of words). You write that engine/function once and just pass the data structures to it.
As an alternative, a solution using regular expressions would be probably the fastest to code (the engine is ready for you), assuming you're familiar with the regexp syntax (if not, it's still a good investment).
You could build a table of keywords and function pointers:
typedef void (*Function_Pointer)(void);
struct table_entry
{
const char * keyword;
Function_Pointer p_function;
};
table_entry function_table[] =
{
{"car", Process_Car},
{"bike", Process_Bike},
};
Search the table for a keyword. If the keyword is found, dereference the function pointer.
The following snippet will execute the function for processing the word "car":
(function_table[0].p_function)();
There is a famous program, called Eliza, which parses sentences for keywords.
Examples can be found at: Eliza C++ examples

Lisp dialect and comparison to Java/C#

Now I'm generally in Java/C# (love both of them, can't really say I'm dedicated to one).
And I've recently been discussing the differences between F# and C# with a friend, when he surprised me saying: "So.. F# sounds a lot like lisp, but with way less 'Swiss-army knife' feel to it."
Now, I was partly ashamed of saying this but I have no idea what lisp was.
After some searching, I saw that lisp is very interesting, but got stumped by the multiple dialects and running environments.
Here is what I know:
I know of 3 dialects:
Common Lisp (I have the Practical Common Lisp book in my bookmarks.
Scheme (a more "theoretical" version of CL)
Clojure. Seems to be a version of CL that runs on JVM.
The basic idea of lisp seems to be about using code as data.
What I want to know:
What is the running environment for different dialects? How do they work/get installed (by this I mean is it a runtime like Java Virtual Machine, or if it requires something else, or if it's supported generally by the OS (as in compiled)). And how to get them (if something is to be gotten)
What is the better dialect to learn (I want the dialect not to be a "learning language" but one you can fully use afterwards without regret of not learning some other one, for example one should first learn C++ before trying out Visual C++, if you know what I mean)
What are the main advantages of lisp in general (I've seen many pages about that saying it's faster in development and execution, but they were all pretty vague about the details)
Can it be generally used for general purpose, or is it concentrated on AI? (By this I mean if, for example, one could make a full console app with it, and then implement OpenGL just as easily and make a game. Learning a language specialized on something precise is worthwhile, but not at the moment for me)
I would also be very happy about any additional details you guys can give me! (Links are appreciated too! E-Books and whatnot.)
Edit: all of the answers here were very useful. As such, I gave them all a +1 to rep, but chose the more concrete one as best. Thank you all.
I also learnt Java and C# intensively before coming to Lisp so hopefully can share some useful perspectives.
Firstly, all Lisps are great and you should definitely consider learning one. There's a famous quote by Eric Raymond:
"Lisp is worth learning for the profound enlightenment experience you
will have when you finally get it; that experience will make you a
better programmer for the rest of your days, even if you never
actually use Lisp itself a lot."
Reasons that Lisps are particularly interesting and powerful are:
Homoiconicity - in Lisp "code is data" - the language itself is written in Lisp data structures. In itself this is interesting, but where it gets really powerful is when you start using this for code generation and advanced macros. Some believe that this features is a key reason why Lisp can help you be more productive than anyone else (short Paul Graham essay)
Interactice development at the REPL - a few other languages also have this, but it is particularly idiomatic and deep-rooted in Lisp culture. It's remarkably productive and liberating to develop while altering a live running program. Recent examples that caught my eye include music hacking with overtone and editing a live game simulation.
Dynamic typing - opinion is more divide on whether this is an advantage or not (I'm personally neutral) but many people thing that dynamically typed langauges give you a productivity advantage, at least in terms of building things quickly. YMMV.
My personal recommendation for a Lisp to learn nowadays would be Clojure. Clojure has a few distinct advantages that make it stand out:
Modern language design - Clojure "refines" Lisp in a number of ways. For example, Clojure adds some new syntax for vectors [] and hashmaps {} in addition to lists (). Purists may disapprove, but I personally believe these find of innovations make the language much nicer to use and read.
Functional first and foremost - all the Lisps are good as functional languages, however Clojure takes it much further. All the standard library is written in terms of pure functions. All data structures are immutable. Mutable state is strictly limited. Lazy sequences (including infinite sequences) are supported. In some senses it feels a bit more like Haskell than the other Lisps.
Concurrency - Clojure has a unique approach to managing concurrency, supported by a very good STM implementation. Worth watching this excellent video for a much deeper explanation.
Runs on the JVM - whatever you think of Java, the JVM is a great platform with extremely good GC, JIT compilation, cross platform portability etc. This can be a barrier to entry for some, but anyone used to Java or C# should quickly feel at home.
Library ecosystem - since Clojure runs on the JVM, it can use Java libraries extremely easily. Calling a Java API from Clojure is trivial - it's just like any other function call with a syntax of (.methodName someObject arg1 arg2). With the availability of the huge Java library ecosystem (mostly open source) Clojure basically leapfrogs all the "niche" languages in terms of practical usefulness
In terms of applications, Clojure is designed to be a fully general purpose langauge so can be used in any field - certainly not limited to AI. I know of people using it in startups, using it for big data processing, even writing games.
Finally on the performance point: you are basically always going to pay a slight performance penalty for using higher level language constructs. However Clojure in my experience is "close enough" to Java or C# that you won't notice the difference for general purpose development. It helps that Clojure is always compiled and that you can use optional type hints to get the performance benefits of static typing.
The flawed benchmarks (as of early 2012) put Clojure within a factor of 2-3 of the speed of statically typed languages like Java, Scala and C#, a little bit behind Common Lisp and a little bit ahead of Scheme (Racket).
Lisp, as you've discovered, is not one language; it's a family of languages that have certain features in common.
There are two primary dialects of Lisp: Common Lisp and Scheme. Each of those two dialects has many implementations, each with their own features. However, both Common Lisp and Scheme are standardized, and the standards define a certain baseline of features which you can expect any implementation to have.
Scheme is a minimalistic language with a very small standard library. It is used primarily by students and theoreticians. Common Lisp has many more language features and a much larger standard library, including a powerful object system, and has been used in large production systems.
Clojure is another minor, more recent dialect. If you want to understand Lisp, you're better off first learning either Common Lisp or Scheme.
My recommendation is to learn Scheme first; it's a purer expression of the ideas that Lisp is made of, and will help you understand the essence of the language. In many ways, Lisp is completely different from Java and other imperative languages; however, what you learn from it will make you a better programmer in those languages. You can easily learn Common Lisp after you know Scheme.
The advantage of Lisp is, simply put, that it's more powerful than other languages. All Lisp code is Lisp data and can be manipulated as such; this allows you to do really cool things with metaprogramming that simply can't be done in other languages, because they don't give you direct access to the data structures that comprise your code. (The reason Lisp can do this and they can't is intimately related to its strange-looking syntax. Every compiler or interpreter, after reading the source code, must translate it into abstract syntax trees. Unlike other languages, Lisp's syntax is a direct representation of the ASTs that Lisp code is translated into, so you know what those trees look like and can manipulate them directly.) The most commonly used metaprogramming feature is macros; Lisp macros can literally translate a bit of source code into anything you can program. You can't do that with, say, C macros.
The "faster in development and execution" thing may have been a reference to one specific feature which most Lisp implementations provide: the read-eval-print loop. You can type an expression into a prompt and the interpreter will evaluate it and print the result. This is wonderful both for learning the language and for debugging or otherwise investigating code.
Lisp is dynamically typed (though statically typed flavors do exist). Most implementations of Lisp run on their own virtual machine; however, many can also be compiled to machine code. Clojure was written specifically to target the JVM; it can also target .NET and JavaScript.
Though originally created for AI research, Lisp is by no means exclusively for AI. The main reason why it's not more popular in mainstream production environments (apart from the self-perpetuating dominance of Java and C#) is library support. Common Lisp has many good libraries out there (Scheme less so), but it pales in comparison to the vast amount of library support available for Java or Python.
If you want to get started, I recommend downloading Racket, a highly popular implementation of Scheme. It has everything you need, including a simple-but-very-powerful IDE with a read-eval-print loop, right out of the box. Though originally developed as a teaching language, it comes with a very large standard library more characteristic of Common Lisp than of Scheme. As a result, it's seeing use in real production environments.
Runtime Environments
Common Lisp and Scheme generally have their own unique runtime environments. There are some variants of Scheme (Chicken and Gambit) which can be translated to C and then linked with their environments so as to be able to be deployed as stand alone executable programs. Clojure runs in the JVM, and there is also a CLR port, but its not clear to me that the CLR port is current with the JVM. Clojure also has Clojurescript, which targets a Javascript runtime.
Which is Better to Learn First
I don't think that question has a good answer. Its up to you. Although if you have experience with the JVM, Clojure might be a bit smoother to start with.
What is Better about Lisp
That's a question liable to start a flame war. I don't have much lisp experience. I started learning Clojure a few months ago in earnest, have looked at Common Lisp and Scheme on and off over the years.
What I like is their dynamic natures. You need to change a function at runtime while your program is running? No problem! Like any power tool, you have to be careful not to chop your bits off when using this.
The power and expressiveness is addicting too. I am able to do some things with little effort that I know I could not achieve in Java, or I know would require a lot more work. Specifically, I was able to put together a description of a data structure - and though the use of macros, delay evaluation of parts of the data until the right time. If I had done that in Java, I would not have been able to nest the declarations like I did because they would have evaluated in the wrong order. Pain would have ensued.
I also like Clojure's view of functional programming, although I have to say it requires work to adjust.
Is Lisp General Purpose
Yes.
--
Mark Volkman has a really good article on Clojure. Many basics are there. One thing that I did in the beginning was to just fire up a repl and experiment when I needed to figure something out programmatically. e.g. explore an API or do some calculations. After a short period of time with that I started working on more building up levels of effort, and I have a project that I'm working on right now that involves Clojure.
There isn't a bad book about Clojure that has been written. The Stuart Sierra book is being updated; and the Oreilly book is about to come out soon, so you might want to wait. The Joy of Clojure is good, but I don't think its a good starter book.
For Common Lisp, I highly recommend the Land of Lisp.
For Scheme, there are several classics including The Little Schemer and SICP.
Oh, and this: http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey (maybe one of the most important talks you'll ever watch), and this http://www.infoq.com/presentations/hickey-clojure (IIRC, really good intro to Clojure).
common lisp
Common Lisp is both compiled and interpreted. Deployments (in Windows) can be done by an exe with DLLs. Or by a precompiled bytecode. Or by installing a Lisp system on the target device and executing the source against it.
Common Lisp is a fully usable industrial language with an active community and libraries for many different tasks.
Lisps are generally faster for development and due to the abstraction capabilities, better at developing higher level concepts. It's hard to explain. Ruby vs. C is an example of this sort of thing. All Lisps carry this capacity IMO.
Common Lisp is a general purpose language. I don't know offhand if modern Common Lisp implementations directly support executing assembly, so it may be difficult to write drivers or use compiler-unsupported CPU instructions.
I like Common Lisp, but Clojure and Racket are not to be sneezed at either. Clojure in particular represents a very interesting track, in my opinion.
For e-books, you can get On Lisp by Graham and Gentle Introduction to Symbolic Computation. Possibly others but those are the ones I can recall.

Lua vs Embedded Lisp and potential other candidates. for set based data processing

Current Choice: lua-jit. Impressive benchmarks, I am getting used to the syntax. Writing a high performance ABI will require careful consideration on how I will structure my C++.
Other Questions of interest
Gambit-C and Guile as embeddable languages
Lua Performance Tips (have the option of running with a disabled collector, and calling the collector at the end of a processing run(s) is always an option).
Background
I'm working on a realtime high volume (complex) event processing system. I have a DSL that represents the schema of the event structure at source, the storage format, certain domain specific constructs, firing internal events (to structure and drive general purpose processing), and encoding certain processing steps that always happen.
The DSL looks pretty similar to SQL, infact I am using berkeley db (via sqlite3 interface) for long-term storage of events. The important part here is that the processing of events is done set-based, like SQL. I have come to conclusion however that I should not add general-purpose processing logic to the DSL, and rather embed lua or lisp to take care of this.
The processing core is built arround boost::asio, it is multithreaded, rpc is done via protocol buffers, events are encoded using the protocol buffer IO library --i.e., the events are not structured using protocol buffer object they just use the same encoding/decoding library. I will create a dataset object that contains rows, pretty similar to how a database engine stores in memory sets. processing steps in the DSL will be taken care of first and then presented to the general purpose processing logic.
Regardless of what embeddable scripting environment I use, each thread in my processing core will probably needs it's own embedded-language-environment (that is how lua requires it to be at least if you are doing multi-threaded work).
The Question(s)
The choice at the moment is between lisp ECL and lua. Keeping in mind that performance and throughput is a strong requirement, this means minimising memory allocations is highly desired:
If you were in my position which language would you chose ?
are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?
Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D
Are there any other properties (like the ones below) I should be thinking about ?
I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D
Desired Properties
I would like to map my dataset onto a lisp list or lua table and I would like to minimise redundant data copies. For example adding a row from one dataset to another should try to use reference semantics if both tables have the same shape.
I can guarantee that the dataset that is passed as input will not change whilst I have made the lua/lisp call. I want lua and lisp to enforce not altering the dataset as well if possible.
After the embedded call end's the datasets should be destroyed, any references created would need to be replaced with copies (I guess).
DSL Example
I attach a DSL for your viewing pleasure so you can get an idea of what I am trying to achieve. Note: The DSL does not show general purpose processing.
// Derived Events : NewSession EndSession
NAMESPACE WebEvents
{
SYMBOLTABLE DomainName(TEXT) AS INT4;
SYMBOLTABLE STPageHitId(GUID) AS INT8;
SYMBOLTABLE UrlPair(TEXT hostname ,TEXT scriptname) AS INT4;
SYMBOLTABLE UserAgent(TEXT UserAgent) AS INT4;
EVENT 3:PageInput
{
//------------------------------------------------------------//
REQUIRED 1:PagehitId GUID
REQUIRED 2:Attribute TEXT;
REQUIRED 3:Value TEXT;
FABRRICATED 4:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
}
// Derived Event : Pagehit
EVENT 2:PageHit
{
//------------------------------------------------------------//
REQUIRED 1:PageHitId GUID;
REQUIRED 2:SessionId GUID;
REQUIRED 3:DateHit DATETIME;
REQUIRED 4:Hostname TEXT;
REQUIRED 5:ScriptName TEXT;
REQUIRED 6:HttpRefererDomain TEXT;
REQUIRED 7:HttpRefererPath TEXT;
REQUIRED 8:HttpRefererQuery TEXT;
REQUIRED 9:RequestMethod TEXT; // or int4
REQUIRED 10:Https BOOL;
REQUIRED 11:Ipv4Client IPV4;
OPTIONAL 12:PageInput EVENT(PageInput)[];
FABRRICATED 13:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT PageInput PROVIDE(PageHitIdSymbol);
}
EVENT 1:SessionGeneration
{
//------------------------------------------------------------//
REQUIRED 1:BinarySessionId GUID;
REQUIRED 2:Domain STRING;
REQUIRED 3:MachineId GUID;
REQUIRED 4:DateCreated DATETIME;
REQUIRED 5:Ipv4Client IPV4;
REQUIRED 6:UserAgent STRING;
REQUIRED 7:Pagehit EVENT(pagehit);
FABRICATED 8:DomainId INT4;
FABRICATED 9:PagehitId INT8;
//-------------------------------------------------------------//
DomainId AS SYMBOLISE(domain) USING DomainName;
PagehitId AS SYMBOLISE(pagehit:PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT pagehit PROVIDE (PagehitId);
}
}
This project is a component of a Ph.D research project and is/will be free software. If your interested in working with me (or contributing) on this project, please leave a comment :D
I strongly agree with #jpjacobs's points. Lua is an excellent choice for embedding, unless there's something very specific about lisp that you need (for instance, if your data maps particularly well to cons-cells).
I've used lisp for many many years, BTW, and I quite like lisp syntax, but these days I'd generally pick Lua. While I like the lisp language, I've yet to find a lisp implementation that captures the wonderful balance of features/smallness/usability for embedded use the way Lua does.
Lua:
Is very small, both source and binary, an order of magnitude or more smaller than many more popular languages (Python etc). Because the Lua source code is so small and simple, it's perfectly reasonable to just include the entire Lua implementation in your source tree, if you want to avoid adding an external dependency.
Is very fast. The Lua interpreter is much faster than most scripting languages (again, an order of magnitude is not uncommon), and LuaJIT2 is a very good JIT compiler for some popular CPU architectures (x86, arm, mips, ppc). Using LuaJIT can often speed things up by another order of magnitude, and in many cases, the result approaches the speed of C. LuaJIT is also a "drop-in" replacement for standard Lua 5.1: no application or user code changes are required to use it.
Has LPEG. LPEG is a "Parsing Expression Grammar" library for Lua, which allows very easy, powerful, and fast parsing, suitable for both large and small tasks; it's a great replacement for yacc/lex/hairy-regexps. [I wrote a parser using LPEG and LuaJIT, which is much faster than the yacc/lex parser I was trying emulate, and was very easy and straight-forward to create.] LPEG is an add-on package for Lua, but is well-worth getting (it's one source file).
Has a great C-interface, which makes it a pleasure to call Lua from C, or call C from Lua. For interfacing large/complex C++ libraries, one can use SWIG, or any one of a number of interface generators (one can also just use Lua's simple C interface with C++ of course).
Has liberal licensing ("BSD-like"), which means Lua can be embedded in proprietary projects if you wish, and is GPL-compatible for FOSS projects.
Is very, very elegant. It's not lisp, in that it's not based around cons-cells, but it shows clear influences from languages like scheme, with a straight-forward and attractive syntax. Like scheme (at least in it's earlier incarnations), it tends towards "minimal" but does a good job of balancing that with usability. For somebody with a lisp background (like me!), a lot about Lua will seem familiar, and "make sense", despite the differences.
Is very flexible, and such features as metatables allow easily integrating domain-specific types and operations.
Has a simple, attractive, and approachable syntax. This might not be such an advantage over lisp for existing lisp users, but might be relevant if you intend to have end-users write scripts.
Is designed for embedding, and besides its small size and fast speed, has various features such as an incremental GC that make using a scripting language more viable in such contexts.
Has a long history, and responsible and professional developers, who have shown good judgment in how they've evolved the language over the last 2 decades.
Has a vibrant and friendly user-community.
You don't state what platform you are using, but if it would be capable of using LuaJIT 2 I'd certainly go for that, since execution speeds approach that of compiled code, and interfacing with C code just got a whole lot easier with the FFI library.
I don't quit know other embeddable scripting languages so I can't really compare what they can do, and how they work with tables.
Lua mostly works with references: all functions, userdata, tables are used by reference, and are collected on the next gc run when no references to the data are left.
Strings are internalised, so a certain string is in the memory only once.
The thing to take into account is that you should avoid creating and subsequently discarding loads of tables, since this can slow down the GC cycle (as explained in the Lua gem you cited)
For parsing your code sample, I'd take a look at the LPEG library
There is a number of options for implementing high performance embedded compilers. One is Mono VM, it naturally comes with dozens of already made high quality languages implemented on top of it, and it is quite embeddable (see how Second Life is using it). It is also possible to use LLVM - looks like your DSL is not complicated, so implementing an ad hoc compiler would not be a big deal.
I happened to work on a project which have some parts that is similar to your project, It's a cross-platform system running on Win-CE,Android,iOS, I need maximize cross-platform-able code, C/C++ combine with a embeddable language is a good choice. here is my solution related to your questions.
If you were in my position which language would you chose ?
The DSL in my project is similar to yours. for performance, I wrote a compiler with Yacc/Lex to compile the DSL to binary for runtime and a bunch of API to get information from binary, but it's annoying when there is something modified in DSL syntax, I need to modify both compiler and APIs, so I abondoned the DSL, turned into XML(don't write XML directly, a well defined schema is worthy), I wrote a general compiler converting XML to lua table, reimplement APIs with lua. by doing this I got two benefits: Readability and flexibility, without perceivable performance degradation.
Are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?
Before choosing lua, I considerd Embedded Ch(mostly used in industrial control system) , embedded lisp and lua, at last lua stand out, because lua is well integrated with C, lua have a prosperous community, and lua is easy to learn for another team member. regarding Javascript v8, it's like using a steam-hammer to crack nuts, if used in a embedded realtime system.
Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D
For my domain, lisp and lua have the same ability in semantic, they both can handle XML-based DSL easily, or you might even wrote a simple compiler converting XML to lisp list or lua table. they both can handle domain logic easily. but lua is better integrated with C/C++, this is what lua aim for.
Are there any other properties (like the ones below) I should be thinking about ?
Working alone or with team members is also a weighting factor of solution selection. nowadays not so many programmers are familiar with lisp-like language.
I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D
here is a list of programming languages performance, here is a list of access time of computer components. if your system is IO-bound, the overhead of script is not key point. my system is a O&M(Operation & Maintenance) system, script performance is insignificant.

What's a good and safe language for drawing intensive particle systems over a long period of time?

Another one of my rather ambiguous question today, sorry.
Currently I have written some half decent software that has a 'roll your own' RESTful client, which pulls data from twitter. This data is then visualized with a number of particle systems using Open FrameWorks (a framework that works with c++).
My plans for this were to run the software indefinitely on my VPS, and build some kind of front end GUI allowing users to explore the pretty particles and so on. Between the JSON library I am using, C/C++, OpenFrameworks, and freaking Xcode4 I have produced way too many SIGBIRT and GDB errors to care for. I have go to the ends of the virtual world to fix them, and re wrote everything over and over. I even managed to SIGBIRT the openframeworks draw circle method, HAH!
(TL;DR starts here) Ok so anyway I am starting from scratch, looking for a powerful language that can crunch maths and blast through a good set of particles, and run quite well over the longest periods of time. Right now I am thinking about haskell, any ideas?
Thanks in advance all!
Haskell's (or more specifically GHC's) number crunching speed is approaching that of C++ but it's a little way behind. However, it's certainly not terrible, and Haskell's advantages in parallelism may become important. That is, if you write it in straight Haskell first, there's a good chance that it'll be easy to refactor it to run in parallel now or in the future. That isn't so true of C++.
The 'vector' package (on Hackage) would be a good choice for arrays suitable for number crunching. It supports mutable arrays in case that sort of approach is needed. However, if you're prepared to go more on the bleeding edge and your algorithm can be parallelized, you might want to look at the 'repa' package, and for extreme performance on a GPU, take a look at 'Accelerate' (which works but is still categorized as experimental).
The crashes you mention sound like they could be an indication of a bit of complexity in your problem. Where Haskell does well is in managing the complexity of... well, anything. So, if the problem is complex, then Haskell will serve you very well.
The foreign function interface in Haskell is well designed, though you will need to write C glue between Haskell and C++. So, that's another option for your number crunching.
For the web interface, take a look at 'yesod' which is seeing very active development and advertises itself as doing RESTful.
AFAIK, number crunching speed is not Haskell's strongest point - it's a highly abstract language, far from the 'metal'; its strength in a numeric processing context lies in the "mathiness" of its semantics - Haskell code often reads much like a Mathematical proof, and many of its concepts are borrowed from various fields of Mathematics.
For plain old number crunching, C++ is probably still your best choice, as it allows you to stay close to the hardware and optimize tight loops at the machine level, while offering higher-level programming constructs to manage complexity.
OTOH, if you have a library in place for the heavy lifting, and you merely need to write the glue to make the various parts work together, then go with whatever you're most comfortable with - python, C#, java, haskell, C++, ... - as long as they have bindings for all your libraries, you're good. If you don't have a library, then you might also consider writing the performance critical parts in C, and then pull them into your favorite high-level language - this is trivial in C++, slightly harder in python or haskell, and pretty damn inconvenient in java.

References Needed for Implementing an Interpreter in C/C++

I find myself attached to a project to integerate an interpreter into an existing application. The language to be interpreted is a derivative of Lisp, with application-specific builtins. Individual 'programs' will be run batch-style in the application.
I'm surprised that over the years I've written a couple of compilers, and several data-language translators/parsers, but I've never actually written an interpreter before. The prototype is pretty far along, implemented as a syntax tree walker, in C++. I can probably influence the architecture beyond the prototype, but not the implementation language (C++). So, constraints:
implementation will be in C++
parsing will probably be handled with a yacc/bison grammar (it is now)
suggestions of full VM/Interpreter ecologies like NekoVM and LLVM are probably not practical for this project. Self-contained is better, even if this sounds like NIH.
What I'm really looking for is reading material on the fundamentals of implementing interpreters. I did some browsing of SO, and another site known as Lambda the Ultimate, though they are more oriented toward programming language theory.
Some of the tidbits I've gathered so far:
Lisp in Small Pieces, by Christian Queinnec. The person recommending it said it "goes from the trivial interpreter to more advanced techniques and finishes presenting bytecode and 'Scheme to C' compilers."
NekoVM. As I've mentioned above, I doubt that we'd be allowed to incorporate an entire VM framework to support this project.
Structure and Interpretation of Computer Programs. Originally I suggested that this might be overkill, but having worked through a healthy chunk, I agree with #JBF. Very informative, and mind-expanding.
On Lisp by Paul Graham. I've read this, and while it is an informative introduction to Lisp principles, is not enough to jump-start constructing an interpreter.
Parrot Implementation. This seems like a fun read. Not sure it will provide me with the fundamentals.
Scheme from Scratch. Peter Michaux is attacking various implementations of Scheme, from a quick-and-dirty Scheme interpreter written in C (for use as a bootstrap in later projects) to compiled Scheme code. Very interesting so far.
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, recommended in the comment thread for Books On Creating Interpreted Languages. The book contains two chapters devoted to the practice of building interpreters, so I'm adding it to my reading queue.
New (and yet Old, i.e. 1979): Writing Interactive Compilers and Interpreters by P. J. Brown. This is long out of print, but is interesting in providing an outline of the various tasks associated with the implementation of a Basic interpreter. I've seen mixed reviews for this one but as it is cheap (I have it on order used for around $3.50) I'll give it a spin.
So how about it? Is there a good book that takes the neophyte by the hand and shows how to build an interpreter in C/C++ for a Lisp-like language? Do you have a preference for syntax-tree walkers or bytecode interpreters?
To answer #JBF:
the current prototype is an interpreter, and it makes sense to me as we're accepting a path to an arbitrary code file and executing it in our application environment. The builtins are used to affect our in-memory data representation.
it should not be hideously slow. The current tree walker seems acceptable.
The language is based on Lisp, but is not Lisp, so no standards compliance required.
As mentioned above, it's unlikely that we'll be allowed to add a full external VM/interpreter project to solve this problem.
To the other posters, I'll be checking out your citations as well. Thanks, all!
Short answer:
The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).
Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.
See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .
A slightly longer answer:
It is hard to give advice without knowing what you require from your interpreter.
does it really really need to be an interpreter, or do you actually need to be able to execute lisp code?
does it need to be fast?
does it need standards compliance? Common Lisp? R5RS? R6RS? Any SFRIs you need?
If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .
If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.
For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .
There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.
If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.
John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .
Yes on SICP.
I've done this task several times and here's what I'd do if I were you:
Design your memory model first. You'll want a GC system of some kind. It's WAAAAY easier to do this first than to bolt it on later.
Design your data structures. In my implementations, I've had a basic cons box with a number of base types: atom, string, number, list, bool, primitive-function.
Design your VM and be sure to keep the API clean. My last implementation had this as a top-level API (forgive the formatting - SO is pooching my preview)
ConsBoxFactory &GetConsBoxFactory() { return mConsFactory; }
AtomFactory &GetAtomFactory() { return mAtomFactory; }
Environment &GetEnvironment() { return mEnvironment; }
t_ConsBox *Read(iostream &stm);
t_ConsBox *Eval(t_ConsBox *box);
void Print(basic_ostream<char> &stm, t_ConsBox *box);
void RunProgram(char *program);
void RunProgram(iostream &stm);
RunProgram isn't needed - it's implemented in terms of Read, Eval, and Print. REPL is a common pattern for interpreters, especially LISP.
A ConsBoxFactory is available to make new cons boxes and to operate on them. An AtomFactory is used so that equivalent symbolic atoms map to exactly one object. An Environment is used to maintain the binding of symbols to cons boxes.
Most of your work should go into these three steps. Then you will find that your client code and support code starts to look very much like LISP too:
t_ConsBox *ConsBoxFactory::Cadr(t_ConsBox *list)
{
return Car(Cdr(list));
}
You can write the parser in yacc/lex, but why bother? Lisp is an incredibly simple grammar and scanner/recursive-descent parser pair for it is about two hours of work. The worst part is writing predicates to identify the tokens (ie, IsString, IsNumber, IsQuotedExpr, etc) and then writing routines to convert the tokens into cons boxes.
Make it easy to write glue into and out of C code and make it easy to debug issues when things go wrong.
The Kamin Interpreters from Samuel Kamin's book Programming Languages, An Interpreter-Based Approach, translated to C++ by Timothy Budd. I'm not sure how useful the bare source code will be, as it was meant to go with the book, but it's a fine book that covers the basics of implementing Lisp in a lower-level language, including garbage collection, etc. (That's not the focus of the book, which is programming languages in general, but it is covered.)
Lisp in Small Pieces goes into more depth, but that's both good and bad for your case. There's a lot of material on compiling and such that won't be relevant to you, and its simpler interpreters are in Scheme, not C++.
SICP is good, definitely. Not overkill, but of course writing interpreters is only a small fraction of the book.
The JScheme suggestion is a good one, too (and it incorporates some code by me), but won't help you with things like GC.
I might flesh this out with more suggestions later.
Edit: A few people have said they learned from my awklisp. This is admittedly kind of a weird suggestion, but it's very small, readable, actually usable, and unlike other tiny-yet-readable toy Lisps it implements its own garbage collector and data representation instead of relying on an underlying high-level implementation language to provide them.
Check out JScheme from Peter Norvig. I found this amazingly simple to understand and port to C++. Uh, dunno about using scheme as a scripting language though - teaching it to jnrs is cumbersome and feels dated (helloooo 1980's).
I would like to extend my recommendation for Programming Languages: Application and Interpretation. If you want to write an interpreter, that book takes you there in a very short path. If you read through writing the code you read and doing the exercise you end up with a bunch of similar interpreters but different (one is eager, the other is lazy, one is dynamic, the other has some typing, one has dynamic scope, the other has lexical scope, etc).