Related
I have been trying to think more about what abstraction actually means in functional programming. The very best post I have found that talks the type of language I can understand is the following 4 abstractions. However since I'm a wannabe Clojure programmer I'm wondering what sort of abstractions macros provide. It seems that they fit in stage 2 together with HOF but at the same time they are more then a HOF. I find stage 3 to be related to the Expression problem and would be protocol and multi methods in Clojure. So my question is:
When implementing a macro in a Lisp language what would you say you are abstracting over?
What would stage 3 and 4 be in a Lisp language?
I don't really view macros as an abstraction, but more as a compiler hook.
Most languages implement what is known as an Abstract Syntax Tree (or AST). This is a representation of the code of a program in a sort of data structure. Lisp macros expose parts of this AST as data that can be transformed via a macro function. But since lisp programs are themselves data structures, macros tend to be a bit cleaner in lisp programs then they would be in Rust or Scala.
So one could say that macros are simply abstractions of language semantics...but I don't know that I agree with that. One could say that macros are extensions of the lisp compiler, but that's not exactly true either.
As it turns out, macros are quite limited. They can only see a small subsection of the code being compiled. In other words, a macro can't see up the tree, only down. In addition while macros that perform deep inspection of children in the AST are possible (known as deep walking macros) these macros tend to be complex and error prone (just look at the guts of core.async's go or the contents of midje to see how complex these can get). So I hesitate to call them abstractions, perhaps they are, perhaps they are just very limited abstractions.
So I see macros as a weird mix between the more powerful Fexprs (http://en.wikipedia.org/wiki/Fexpr) and the more complete compiler code transforms found in projects like LLVM. They provide a very limited controlled way to transform code at compile time, that's about it.
And in the end it all comes down to the lisp mantra that "code is data is code". If your code is data it makes sense to provide ways to transform it at compile time.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to write a simple in-house program that parses user commands written in a language of our team's own invention (but based closely on another program we are already familiar with). The command parser that I am working on now will simply be the UI through which the user can run the other algorithms I have already written. (Those other algorithms, by the way, are used to generate the input files for a molecular dynamic simulation package called LAMMPS.) The only thing I really have left to do is just write this UI, but as it turns out, writing your own scripting language is almost an intractable challenge for a non software engineer to tackle on his own.
According to the answers I received, what I am try to make would be considered a Domain Specific Language, and it is not advisable to try to make one's own DSL due to the enormous amount of work required to make it useful and bug-free.
The best option then would actually be to use an existing scripting language like Lua or Python, and embed it in the program.
To do this, I will most likely use Lua because it seems most fitting for our needs. So at this point, the rest of this question is no longer relevant since the answer would be: "Don't do it yourself." But I'm still going to keep part of it here for other users to be able read and learn from the wonderful answers below.
Thanks again to everyone who replied!
Old Question:
I would like to write a program that parses a user text input and then
runs a function corresponding to that input. To do this I would need
to parse the string for relevant keywords. I believe there will be
less than 15 keywords when I'm done, so ideally I'd like this code
to be simple and short.
The problem is that I am currently using if-statements to parse the
strings. This is an extremely inconvenient way to parse commands
because even for a short 3 word commands the code explodes into nested-ifs
3 layers deep. So longer 8+ word sentences will become nested-ifs more than
8 layers deep.
This kind of programing approach quickly becomes unmanageable, especially
when I need to make any significant changes to a command.
My question is whether or not there exists a data structure in C++ that
can help me better manage my giant nested-ifs, or if anyone could suggest
a better way to parse a string for lots of different data types (i.e.
substings, ints, and floats) and output an error message when the expected
type is not found?
Here is an example of a short user session to show the kinds of commands
I would like to interpret:
load "Basis.Silicon" as material 1
add material 1 to layer 1
rotate layer 1 about x-axis by 45 degrees
translate layer 1 in x-axis by 10 nm
generate crystal
These commands are based on an already-existing program that our team
uses, but unfortunately the source code for this program has never been
publicly released so I am left guessing as to how it was actually
implemented.
One final note, unlike natural language processors, I know exactly what
the format of each line will be. So my issue isn't so much how to interpret
the text, but rather how to code the logic in a concise and manageable way.
Thanks everyone!
Your question is not clear. And your goals are more difficult than what you believe.
Either you consider that you want to somehow process human language sentences (e.g. in English). Then you want to study natural language processing, and you can find some libraries related to that field.
Or you consider that you want to interpret some formal programming or scripting language. Then you want to study interpreters and compilers. BTW, in that case, you might just embed an existing interpreter (like Lua, Guile, Python, etc....) in your program.
You could also think in terms of expert systems with a knowledge base made of rules (this approach could be viewed as in the middle between NLP and scripting language) You'll then need some inference engine (perhaps CLIPS). See also J.Pitrat's blog.
Notice that even coding a simple interpreter is more difficult than you believe. You absolutely need to represent abstract syntax trees, which you construct from textual input with a parsing phase.
BTW, All of NLP, expert systems, and interpreter design and implementation are difficult fields. You could get a PhD in all 3 fields (but you have to choose which).
If you go the embedded interpreter way: study the interpreters I mentioned (Guile, Lua, Python, Neko, etc...) and choose which one you want, to embed.
If for whatever reason, you want to make an interpreter from scratch: Learn several programming languages first (including scripting languages like Ruby, Python, Ocaml, Scheme, Lua, Neko, ...). Read books on Programming Language Pragmatics (by M.Scott) and Lisp In Small Pieces (by Queinnec). Read also text books on compilation and parsing, and on Garbage Collection and formal (e.g. denotational) semantics. All this may need a dozen years of work.
Notice that by experience embedding a software in an interpreter is a very structuring design. If you did not thought of that at the beginning you probably need to redesign and refactor a lot your existing application. For instance, when embedding a software in an interpreter, you cannot afford that bad input crashes the program. So error handling and memory management (interfacing to the GC of the interpreter) is challenging and gives new constraints. Hence you'll need to re-think your application.
If all this is new (and even if you don't choose e.g. Guile as the embedding interpreter): learn and practice a bit of Scheme -e.g. with Guile or PltScheme- (e.g. reading SICP), read a little bit about λ-calculus and closures, then read Queinnec's Lisp In Small Pieces book. Remember the halting problem (which is partly why interpreters are difficult to code).
BTW the syntax you are proposing (e.g. rotate mat 1 by x 90) is not very readable and looks COBOL-like. If possible, have a language which looks familiar to existing ones. Make it easy to read !
Start by reading all the wikipages I am referencing here.
FWIW, I am the main author of MELT, a domain specific language (inspired a lot by Scheme) to extend the GCC compiler. Some of the papers / documentations I wrote might inspire you (and contain valuable references).
Addenda (after question was reformulated)
You seems to invent some formal syntax like
add material 1 to layer 1
rotate layer 1 about x-axis by 90 degrees
translate layer 1 in x-axis by 10 inches
I can't guess what kind of language is it? Are you implementing a 3D printer? If yes, you should stick to some existing standard formal language in that domain.
I believe that such a COBOL-like syntax is really wrong. The point is that it is too verbose, and that you are wishing to implement some domain specific language. I find your example very bad-looking.
Is that syntax your invention, or is there some document specifying (and many thousands already existing lines coded in) your domain specific language. If you are just inventing it, please reconsider the syntax and the semantics.
First, you need to specify on paper the full syntax and semantics of your DSL.
Is your DSL Turing complete? (I guess that yes, because Turing completeness is reached very quickly - e.g. with variables and loops....). If yes, you are inventing a scripting language. Please don't invent scripting language without knowing several programming & scripting languages (then read Programming Language Pragmatics...). The point is that, if your scripting language will become successful, advanced users will soon or later write important programs in it (e.g. many thousand lines). Then, these advanced users will be programmers. In that case, it is very important (for social & economic reasons) to have a DSL well founded and looking familiar (if possible, an extension of some existing scripting language).
If your DSL already exists, stick to its specification on paper. If that specification is not good enough, improve it with formalization (e.g. by writing some BNF syntax, and some formal (e.g. denotational) semantics for it). Publish and discuss that formalization with existing users.
Several industries got some ad-hoc DSLs which became widely used but was ill designed
(e.g., in the French nuclear industry, the Gibiane DSL designed in the 1970s by nuclear physicists, not computer scientists; the US Boeing corporation is also rumored to have made similar mistakes). Then, maintaining and improving the many hundred thousands lines of DSL scripts is becoming a nightmare (and may means losing millions of dollars or euros). So you better stick to some existing scripting language. The advantages are that there exist some culture on it (e.g. you can find dozens of books on Python or Lua, and many trained engineers familiar with them), that the interpreter is widely used and tested, that the community working on them is improving the interpreters, so it has quite few uncorrected bugs.
You should not attempt to design and implement your own DSL if you are not a trained computer scientist. Stick to some existing scripting language (of course their syntax is not like you want it to be), and leverage on existing implementations and experiment.
As a counter-example, J.Ousterhout has invented the widely used Tcl scripting language, with the claim that scripts are always small (e.g. hundreds of line only) and won't grow to big code base; unfortunately, some of them did, and Tcl is known as a bad language to code many dozens of thousands of lines (even if Tcl is an easy and convenient language for tiny scripts). The moral of the story is that if a (turing complete) scripting language is becoming successful, some "crazy" advanced user will code hundred of thousands of script code. So you need that scripting language to be well designed from the start. Hence, you should adopt and adapt a good existing scripting language (and avoid inventing an unfamiliar syntax without having a good knowledge of several existing scripting languages)
later additions
PS: my criticism of Tcl is not entirely subjective: the point is that Tcl was designed for small scripts in mind (read J.Ousterhout's first papers about Tcl), but my point is that when you offer a Turing-complete scripting language, some "crazy" user will eventually write huge scripts for it. Hence, you need to anticipate such "crazy" usage by offering a scripting language which "scales up" to big scripts, so is built according to software engineering practices for large software code base.
NB. Lua is probably a good choice as a language to embed. It is small, has a nice implementation, is well documented, and has good performance. But be careful about memory management issues (and this advice holds for any scripting language).
EDIT: To be more clear, I would like to have a short list of key words
(<15). The order/presence of which would determine which function will
be run.
You can build a small ruleset engine (e.g. something that processes lists of words). You write that engine/function once and just pass the data structures to it.
As an alternative, a solution using regular expressions would be probably the fastest to code (the engine is ready for you), assuming you're familiar with the regexp syntax (if not, it's still a good investment).
You could build a table of keywords and function pointers:
typedef void (*Function_Pointer)(void);
struct table_entry
{
const char * keyword;
Function_Pointer p_function;
};
table_entry function_table[] =
{
{"car", Process_Car},
{"bike", Process_Bike},
};
Search the table for a keyword. If the keyword is found, dereference the function pointer.
The following snippet will execute the function for processing the word "car":
(function_table[0].p_function)();
There is a famous program, called Eliza, which parses sentences for keywords.
Examples can be found at: Eliza C++ examples
Now I'm generally in Java/C# (love both of them, can't really say I'm dedicated to one).
And I've recently been discussing the differences between F# and C# with a friend, when he surprised me saying: "So.. F# sounds a lot like lisp, but with way less 'Swiss-army knife' feel to it."
Now, I was partly ashamed of saying this but I have no idea what lisp was.
After some searching, I saw that lisp is very interesting, but got stumped by the multiple dialects and running environments.
Here is what I know:
I know of 3 dialects:
Common Lisp (I have the Practical Common Lisp book in my bookmarks.
Scheme (a more "theoretical" version of CL)
Clojure. Seems to be a version of CL that runs on JVM.
The basic idea of lisp seems to be about using code as data.
What I want to know:
What is the running environment for different dialects? How do they work/get installed (by this I mean is it a runtime like Java Virtual Machine, or if it requires something else, or if it's supported generally by the OS (as in compiled)). And how to get them (if something is to be gotten)
What is the better dialect to learn (I want the dialect not to be a "learning language" but one you can fully use afterwards without regret of not learning some other one, for example one should first learn C++ before trying out Visual C++, if you know what I mean)
What are the main advantages of lisp in general (I've seen many pages about that saying it's faster in development and execution, but they were all pretty vague about the details)
Can it be generally used for general purpose, or is it concentrated on AI? (By this I mean if, for example, one could make a full console app with it, and then implement OpenGL just as easily and make a game. Learning a language specialized on something precise is worthwhile, but not at the moment for me)
I would also be very happy about any additional details you guys can give me! (Links are appreciated too! E-Books and whatnot.)
Edit: all of the answers here were very useful. As such, I gave them all a +1 to rep, but chose the more concrete one as best. Thank you all.
I also learnt Java and C# intensively before coming to Lisp so hopefully can share some useful perspectives.
Firstly, all Lisps are great and you should definitely consider learning one. There's a famous quote by Eric Raymond:
"Lisp is worth learning for the profound enlightenment experience you
will have when you finally get it; that experience will make you a
better programmer for the rest of your days, even if you never
actually use Lisp itself a lot."
Reasons that Lisps are particularly interesting and powerful are:
Homoiconicity - in Lisp "code is data" - the language itself is written in Lisp data structures. In itself this is interesting, but where it gets really powerful is when you start using this for code generation and advanced macros. Some believe that this features is a key reason why Lisp can help you be more productive than anyone else (short Paul Graham essay)
Interactice development at the REPL - a few other languages also have this, but it is particularly idiomatic and deep-rooted in Lisp culture. It's remarkably productive and liberating to develop while altering a live running program. Recent examples that caught my eye include music hacking with overtone and editing a live game simulation.
Dynamic typing - opinion is more divide on whether this is an advantage or not (I'm personally neutral) but many people thing that dynamically typed langauges give you a productivity advantage, at least in terms of building things quickly. YMMV.
My personal recommendation for a Lisp to learn nowadays would be Clojure. Clojure has a few distinct advantages that make it stand out:
Modern language design - Clojure "refines" Lisp in a number of ways. For example, Clojure adds some new syntax for vectors [] and hashmaps {} in addition to lists (). Purists may disapprove, but I personally believe these find of innovations make the language much nicer to use and read.
Functional first and foremost - all the Lisps are good as functional languages, however Clojure takes it much further. All the standard library is written in terms of pure functions. All data structures are immutable. Mutable state is strictly limited. Lazy sequences (including infinite sequences) are supported. In some senses it feels a bit more like Haskell than the other Lisps.
Concurrency - Clojure has a unique approach to managing concurrency, supported by a very good STM implementation. Worth watching this excellent video for a much deeper explanation.
Runs on the JVM - whatever you think of Java, the JVM is a great platform with extremely good GC, JIT compilation, cross platform portability etc. This can be a barrier to entry for some, but anyone used to Java or C# should quickly feel at home.
Library ecosystem - since Clojure runs on the JVM, it can use Java libraries extremely easily. Calling a Java API from Clojure is trivial - it's just like any other function call with a syntax of (.methodName someObject arg1 arg2). With the availability of the huge Java library ecosystem (mostly open source) Clojure basically leapfrogs all the "niche" languages in terms of practical usefulness
In terms of applications, Clojure is designed to be a fully general purpose langauge so can be used in any field - certainly not limited to AI. I know of people using it in startups, using it for big data processing, even writing games.
Finally on the performance point: you are basically always going to pay a slight performance penalty for using higher level language constructs. However Clojure in my experience is "close enough" to Java or C# that you won't notice the difference for general purpose development. It helps that Clojure is always compiled and that you can use optional type hints to get the performance benefits of static typing.
The flawed benchmarks (as of early 2012) put Clojure within a factor of 2-3 of the speed of statically typed languages like Java, Scala and C#, a little bit behind Common Lisp and a little bit ahead of Scheme (Racket).
Lisp, as you've discovered, is not one language; it's a family of languages that have certain features in common.
There are two primary dialects of Lisp: Common Lisp and Scheme. Each of those two dialects has many implementations, each with their own features. However, both Common Lisp and Scheme are standardized, and the standards define a certain baseline of features which you can expect any implementation to have.
Scheme is a minimalistic language with a very small standard library. It is used primarily by students and theoreticians. Common Lisp has many more language features and a much larger standard library, including a powerful object system, and has been used in large production systems.
Clojure is another minor, more recent dialect. If you want to understand Lisp, you're better off first learning either Common Lisp or Scheme.
My recommendation is to learn Scheme first; it's a purer expression of the ideas that Lisp is made of, and will help you understand the essence of the language. In many ways, Lisp is completely different from Java and other imperative languages; however, what you learn from it will make you a better programmer in those languages. You can easily learn Common Lisp after you know Scheme.
The advantage of Lisp is, simply put, that it's more powerful than other languages. All Lisp code is Lisp data and can be manipulated as such; this allows you to do really cool things with metaprogramming that simply can't be done in other languages, because they don't give you direct access to the data structures that comprise your code. (The reason Lisp can do this and they can't is intimately related to its strange-looking syntax. Every compiler or interpreter, after reading the source code, must translate it into abstract syntax trees. Unlike other languages, Lisp's syntax is a direct representation of the ASTs that Lisp code is translated into, so you know what those trees look like and can manipulate them directly.) The most commonly used metaprogramming feature is macros; Lisp macros can literally translate a bit of source code into anything you can program. You can't do that with, say, C macros.
The "faster in development and execution" thing may have been a reference to one specific feature which most Lisp implementations provide: the read-eval-print loop. You can type an expression into a prompt and the interpreter will evaluate it and print the result. This is wonderful both for learning the language and for debugging or otherwise investigating code.
Lisp is dynamically typed (though statically typed flavors do exist). Most implementations of Lisp run on their own virtual machine; however, many can also be compiled to machine code. Clojure was written specifically to target the JVM; it can also target .NET and JavaScript.
Though originally created for AI research, Lisp is by no means exclusively for AI. The main reason why it's not more popular in mainstream production environments (apart from the self-perpetuating dominance of Java and C#) is library support. Common Lisp has many good libraries out there (Scheme less so), but it pales in comparison to the vast amount of library support available for Java or Python.
If you want to get started, I recommend downloading Racket, a highly popular implementation of Scheme. It has everything you need, including a simple-but-very-powerful IDE with a read-eval-print loop, right out of the box. Though originally developed as a teaching language, it comes with a very large standard library more characteristic of Common Lisp than of Scheme. As a result, it's seeing use in real production environments.
Runtime Environments
Common Lisp and Scheme generally have their own unique runtime environments. There are some variants of Scheme (Chicken and Gambit) which can be translated to C and then linked with their environments so as to be able to be deployed as stand alone executable programs. Clojure runs in the JVM, and there is also a CLR port, but its not clear to me that the CLR port is current with the JVM. Clojure also has Clojurescript, which targets a Javascript runtime.
Which is Better to Learn First
I don't think that question has a good answer. Its up to you. Although if you have experience with the JVM, Clojure might be a bit smoother to start with.
What is Better about Lisp
That's a question liable to start a flame war. I don't have much lisp experience. I started learning Clojure a few months ago in earnest, have looked at Common Lisp and Scheme on and off over the years.
What I like is their dynamic natures. You need to change a function at runtime while your program is running? No problem! Like any power tool, you have to be careful not to chop your bits off when using this.
The power and expressiveness is addicting too. I am able to do some things with little effort that I know I could not achieve in Java, or I know would require a lot more work. Specifically, I was able to put together a description of a data structure - and though the use of macros, delay evaluation of parts of the data until the right time. If I had done that in Java, I would not have been able to nest the declarations like I did because they would have evaluated in the wrong order. Pain would have ensued.
I also like Clojure's view of functional programming, although I have to say it requires work to adjust.
Is Lisp General Purpose
Yes.
--
Mark Volkman has a really good article on Clojure. Many basics are there. One thing that I did in the beginning was to just fire up a repl and experiment when I needed to figure something out programmatically. e.g. explore an API or do some calculations. After a short period of time with that I started working on more building up levels of effort, and I have a project that I'm working on right now that involves Clojure.
There isn't a bad book about Clojure that has been written. The Stuart Sierra book is being updated; and the Oreilly book is about to come out soon, so you might want to wait. The Joy of Clojure is good, but I don't think its a good starter book.
For Common Lisp, I highly recommend the Land of Lisp.
For Scheme, there are several classics including The Little Schemer and SICP.
Oh, and this: http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey (maybe one of the most important talks you'll ever watch), and this http://www.infoq.com/presentations/hickey-clojure (IIRC, really good intro to Clojure).
common lisp
Common Lisp is both compiled and interpreted. Deployments (in Windows) can be done by an exe with DLLs. Or by a precompiled bytecode. Or by installing a Lisp system on the target device and executing the source against it.
Common Lisp is a fully usable industrial language with an active community and libraries for many different tasks.
Lisps are generally faster for development and due to the abstraction capabilities, better at developing higher level concepts. It's hard to explain. Ruby vs. C is an example of this sort of thing. All Lisps carry this capacity IMO.
Common Lisp is a general purpose language. I don't know offhand if modern Common Lisp implementations directly support executing assembly, so it may be difficult to write drivers or use compiler-unsupported CPU instructions.
I like Common Lisp, but Clojure and Racket are not to be sneezed at either. Clojure in particular represents a very interesting track, in my opinion.
For e-books, you can get On Lisp by Graham and Gentle Introduction to Symbolic Computation. Possibly others but those are the ones I can recall.
I want to ask what sort of type safety languages constructs are there on Clojure?
I've read 'Practical Clojure' from Luke VanderHart and Stuart Sierra several times now, but i still have the distinct impression that Clojure (like other lisps) don't take compilation-time validation checking very seriously. Type safety is just but one (very popular) strategy for doing compilation-time checking of correct semantics
I'm asking this question because i'm aching to be proven wrong; what sort of design patterns are there available on clojure to validate (at compilation-time, not at run-time) that a function that expects a string doesn't get called with, say, a list of integers?
Also, i've read very smart people like Paul Graham openly advocate about lisp allowing to implement everything from lower-level languages on top of it (most would say that the language themselves are being reimplemented on top of it), so if that assertion would be true, then trivially stuff like type checking should be a piece of cake. So do you feel that there exist type systems (or the ability to implement such type systems) in clojure or other lisps, that give the programmer the ability to offset validation checking from run-time to compile-time, or even better, design-time?
Compilation units in Clojure are very small - a single function. Lispers tend to change small portions of running programs while they develop. Introducing static type checking into this style of development is problematic - for a deeper discussion why I recommend the post Types are Anti-Modular by Gilad Bracha. Thus Clojure's prefers pre/post-conditions which jive better with Lisp's highly REPL-oriented development.
That said, it's certainly desirable and possible to build an a la carte type system for Clojure. This trail has been blazed by Qi/Shen, and Typed Racket. This functionality could be easily provided as a library. I'm hoping to build something like that in the future with core.logic - https://github.com/clojure/core.logic.
Since Clojure is a dynamic language the whole idea is not to check the types (or much of anything) at compile time.
Even when you add type hints to your function they do not get checked at compile-time.
Since Clojure is a Lisp you can do whatever you want at compile-time with macros and macros are powerful enough that you can write your own type systems. Some people have made type systems for lisps Typed Racket and Qi. These Type systems can be just as powerful as any Type system in a "normal" language.
Ok, we now know that it is possible but does Clojure has such a optional type system? The answer is currently no but there is a logic engine (core.logic) that could be used to implement a typesystem but the author has not worked (yet) in that direction.
There is a library that adds an optional type system to Clojure,
http://typedclojure.org/
Rationale
Static typing has well known benefits. For example, statically typed languages catch many common programming errors at the earliest time possible: compile time. Types also serve as an excellent form of (machine checkable) documentation that almost always augment existing hand-written documentation.
Languages without static type checking (dynamically typed) bring other benefits. Without the strict rigidity of mandatory static typing, they can provide more flexible and forgiving idioms that can help in rapid prototyping. Often the benefits of static type checking are desired as the program grows.
This work adds static type checking (and some of its benefits) to Clojure, a dynamically typed language, while still preserving idioms that characterise the language. It allows static and dynamically typed code to be mixed so the programmer can use whichever is more appropriate.
I find myself attached to a project to integerate an interpreter into an existing application. The language to be interpreted is a derivative of Lisp, with application-specific builtins. Individual 'programs' will be run batch-style in the application.
I'm surprised that over the years I've written a couple of compilers, and several data-language translators/parsers, but I've never actually written an interpreter before. The prototype is pretty far along, implemented as a syntax tree walker, in C++. I can probably influence the architecture beyond the prototype, but not the implementation language (C++). So, constraints:
implementation will be in C++
parsing will probably be handled with a yacc/bison grammar (it is now)
suggestions of full VM/Interpreter ecologies like NekoVM and LLVM are probably not practical for this project. Self-contained is better, even if this sounds like NIH.
What I'm really looking for is reading material on the fundamentals of implementing interpreters. I did some browsing of SO, and another site known as Lambda the Ultimate, though they are more oriented toward programming language theory.
Some of the tidbits I've gathered so far:
Lisp in Small Pieces, by Christian Queinnec. The person recommending it said it "goes from the trivial interpreter to more advanced techniques and finishes presenting bytecode and 'Scheme to C' compilers."
NekoVM. As I've mentioned above, I doubt that we'd be allowed to incorporate an entire VM framework to support this project.
Structure and Interpretation of Computer Programs. Originally I suggested that this might be overkill, but having worked through a healthy chunk, I agree with #JBF. Very informative, and mind-expanding.
On Lisp by Paul Graham. I've read this, and while it is an informative introduction to Lisp principles, is not enough to jump-start constructing an interpreter.
Parrot Implementation. This seems like a fun read. Not sure it will provide me with the fundamentals.
Scheme from Scratch. Peter Michaux is attacking various implementations of Scheme, from a quick-and-dirty Scheme interpreter written in C (for use as a bootstrap in later projects) to compiled Scheme code. Very interesting so far.
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, recommended in the comment thread for Books On Creating Interpreted Languages. The book contains two chapters devoted to the practice of building interpreters, so I'm adding it to my reading queue.
New (and yet Old, i.e. 1979): Writing Interactive Compilers and Interpreters by P. J. Brown. This is long out of print, but is interesting in providing an outline of the various tasks associated with the implementation of a Basic interpreter. I've seen mixed reviews for this one but as it is cheap (I have it on order used for around $3.50) I'll give it a spin.
So how about it? Is there a good book that takes the neophyte by the hand and shows how to build an interpreter in C/C++ for a Lisp-like language? Do you have a preference for syntax-tree walkers or bytecode interpreters?
To answer #JBF:
the current prototype is an interpreter, and it makes sense to me as we're accepting a path to an arbitrary code file and executing it in our application environment. The builtins are used to affect our in-memory data representation.
it should not be hideously slow. The current tree walker seems acceptable.
The language is based on Lisp, but is not Lisp, so no standards compliance required.
As mentioned above, it's unlikely that we'll be allowed to add a full external VM/interpreter project to solve this problem.
To the other posters, I'll be checking out your citations as well. Thanks, all!
Short answer:
The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).
Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.
See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .
A slightly longer answer:
It is hard to give advice without knowing what you require from your interpreter.
does it really really need to be an interpreter, or do you actually need to be able to execute lisp code?
does it need to be fast?
does it need standards compliance? Common Lisp? R5RS? R6RS? Any SFRIs you need?
If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .
If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.
For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .
There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.
If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.
John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .
Yes on SICP.
I've done this task several times and here's what I'd do if I were you:
Design your memory model first. You'll want a GC system of some kind. It's WAAAAY easier to do this first than to bolt it on later.
Design your data structures. In my implementations, I've had a basic cons box with a number of base types: atom, string, number, list, bool, primitive-function.
Design your VM and be sure to keep the API clean. My last implementation had this as a top-level API (forgive the formatting - SO is pooching my preview)
ConsBoxFactory &GetConsBoxFactory() { return mConsFactory; }
AtomFactory &GetAtomFactory() { return mAtomFactory; }
Environment &GetEnvironment() { return mEnvironment; }
t_ConsBox *Read(iostream &stm);
t_ConsBox *Eval(t_ConsBox *box);
void Print(basic_ostream<char> &stm, t_ConsBox *box);
void RunProgram(char *program);
void RunProgram(iostream &stm);
RunProgram isn't needed - it's implemented in terms of Read, Eval, and Print. REPL is a common pattern for interpreters, especially LISP.
A ConsBoxFactory is available to make new cons boxes and to operate on them. An AtomFactory is used so that equivalent symbolic atoms map to exactly one object. An Environment is used to maintain the binding of symbols to cons boxes.
Most of your work should go into these three steps. Then you will find that your client code and support code starts to look very much like LISP too:
t_ConsBox *ConsBoxFactory::Cadr(t_ConsBox *list)
{
return Car(Cdr(list));
}
You can write the parser in yacc/lex, but why bother? Lisp is an incredibly simple grammar and scanner/recursive-descent parser pair for it is about two hours of work. The worst part is writing predicates to identify the tokens (ie, IsString, IsNumber, IsQuotedExpr, etc) and then writing routines to convert the tokens into cons boxes.
Make it easy to write glue into and out of C code and make it easy to debug issues when things go wrong.
The Kamin Interpreters from Samuel Kamin's book Programming Languages, An Interpreter-Based Approach, translated to C++ by Timothy Budd. I'm not sure how useful the bare source code will be, as it was meant to go with the book, but it's a fine book that covers the basics of implementing Lisp in a lower-level language, including garbage collection, etc. (That's not the focus of the book, which is programming languages in general, but it is covered.)
Lisp in Small Pieces goes into more depth, but that's both good and bad for your case. There's a lot of material on compiling and such that won't be relevant to you, and its simpler interpreters are in Scheme, not C++.
SICP is good, definitely. Not overkill, but of course writing interpreters is only a small fraction of the book.
The JScheme suggestion is a good one, too (and it incorporates some code by me), but won't help you with things like GC.
I might flesh this out with more suggestions later.
Edit: A few people have said they learned from my awklisp. This is admittedly kind of a weird suggestion, but it's very small, readable, actually usable, and unlike other tiny-yet-readable toy Lisps it implements its own garbage collector and data representation instead of relying on an underlying high-level implementation language to provide them.
Check out JScheme from Peter Norvig. I found this amazingly simple to understand and port to C++. Uh, dunno about using scheme as a scripting language though - teaching it to jnrs is cumbersome and feels dated (helloooo 1980's).
I would like to extend my recommendation for Programming Languages: Application and Interpretation. If you want to write an interpreter, that book takes you there in a very short path. If you read through writing the code you read and doing the exercise you end up with a bunch of similar interpreters but different (one is eager, the other is lazy, one is dynamic, the other has some typing, one has dynamic scope, the other has lexical scope, etc).