What are the tasks of the "reader" during Lisp interpretation? - clojure

I'm wondering about the purpose, or perhaps more correctly, the tasks of the "reader" during interpretation/compilation of Lisp programs.
From the pre-question-research I've just done, it seems to me that a reader (particular to Clojure in this case) can be thought of as a "syntactic preprocessor". It's main duties are the expansion of reader macros and primitive forms. So, two examples:
'cheese --> (quote cheese)
{"a" 1 "b" 2} --> (array-map "a" 1 "b" 2)
So the reader takes in the text of a program (consisting of S-Expressions) and then builds and returns an in-memory data-structure that can be evaluated directly.
How far from the truth is this (and have I over-simplified the whole process)? What other tasks does the reader perform? Considering a virtue of Lisps is their homoiconicity (code as data), why is there a need for lexical analysis (if such is indeed comparable to the job of the reader)?
Thanks!

Generally the reader in Lisp reads s-expressions and returns data structures. READ is an I/O operation: Input is a stream of characters and output is Lisp data.
The printer does the opposite: it takes Lisp data and outputs those as a stream of characters. Thus it can also print Lisp data to external s-expressions.
Note that interpretation means something specific: executing code by an Interpreter. But many Lisp systems (including Clojure) are using a compiler. The tasks of computing a value for a Lisp form is usually called evaluation. Evaluation can be implemented by interpretation, by compilation or by a mix of both.
S-Expression: symbolic expressions. External, textual representation of data. External means that s-expressions are what you see in text files, strings, etc. So s-expressions are made of characters on some, typically external, medium.
Lisp data structures: symbols, lists, strings, numbers, characters, ...
Reader: reads s-expressions and returns Lisp data structures.
Note that s-expressions also are used to encode Lisp source code.
In some Lisp dialects the reader is programmable and table driven (via the so-called read table). This read table contains reader functions for characters. For example the quote ' character is bound to a function that reads an expression and returns the value of (list 'quote expression). The number characters 0..9 are bound to functions that read a number (in reality this might be more complex, since some Lisps allow numbers to be read in different bases).
S-expressions provide the external syntax for data structures.
Lisp programs are written in external form using s-expressions. But not all s-expressions are valid Lisp programs:
(if a b c d e) is usually not a valid Lisp program
the syntax of Lisp usually is defined on top of Lisp data.
IF has for example the following syntax (in Common Lisp http://www.lispworks.com/documentation/HyperSpec/Body/s_if.htm ):
if test-form then-form [else-form]
So it expects a test-form, a then-form and an optional else-form.
As s-expressions the following are valid IF expressions:
(if (foo) 1 2)
(if (bar) (foo))
But since Lisp programs are forms, we can also construct these forms using Lisp programs:
(list 'if '(foo) 1 2) is a Lisp program that returns a valid IF form.
CL-USER 24 > (describe (list 'if '(foo) 1 2))
(IF (FOO) 1 2) is a LIST
0 IF
1 (FOO)
2 1
3 2
This list can for example be executed with EVAL. EVAL expects list forms - not s-expressions. Remember s-expressions are only an external representation. To create a Lisp form, we need to READ it.
This is why it is said code is data. Lisp forms are expressed as internal Lisp data structures: lists, symbols, numbers, strings, .... In most other programming languages code is raw text. In Lisp s-expressions are raw text. When read with the function READ, s-expressions are turned into data.
Thus the basic interaction top-level in Lisp is called REPL, Read Eval Print Loop.
It is a LOOP that repeatedly reads an s-expression, evaluates the lisp form and prints it:
READ : s-expression -> lisp data
EVAL : lisp form -> resulting lisp data
PRINT: lisp data -> s-expression
So the most primitive REPL is:
(loop (print (eval (read))))
Thus from a conceptual point of view, to answer your question, during evaluation the reader does nothing. It is not involved in evaluation. Evaluation is done by the function EVAL. The reader is invoked by a call to READ. Since EVAL uses Lisp data structures as input (and not s-expressions) the reader is run before the Lisp form gets evaluated (for example by interpretation or by compiling and executing it).

Related

Clojure let vs Common Lisp let

In Common Lisp, the let uses a list for a bindings, i.e:
(let ((var1 1)
(var2 2))
...)
While Clojure uses a vector instead:
(let [a 1
b 2]
...)
Is there any specific reason, other than readability, for Clojure to use a vector?
You can find Rich Hickey's argument at Simple Made Easy - slide 14, about 26 minutes in:
Rich's line on this was as follows
"Since we were talking about syntax, let’s look at
classic Lisp. It seems to be the simplest of syntax, everything is a
parenthesized list of symbols, numbers, and a few other things. What
could be simpler? But in reality, it is not the simplest, since to
achieve that uniformity, there has to be substantial overloading of
the meaning of lists. They might be function calls, grouping
constructs, or data literals, etc. And determining which requires
using context, increasing the cognitive load when scanning code to
assess its meaning. Clojure adds a couple more composite data literals
to lists, and uses them for syntax. In doing so, it means that lists
are almost always call-like things, and vectors are used for grouping,
and maps have their own literals. Moving from one data structure to
three reduces the cognitive load substantially."
One of the things he believes was overloaded in the standard syntax was access time. So vector syntax in arguments is related to the constant access time when you used them. He said:
Seems odd though as it only is valid for that one form...as soon as it is stored in a variable or passed in any way the information is 'lost'. For example...
(defn test [a]
(nth a 0)) ;;<- what is the access time of and element of a?
I personally prefer harsh syntax changes like brackets to be reserved for when the programmer has to switch mental models e.g. for embedded languages.
;; Example showing a possible syntax for an embedded prolog.
{size [],0}
{size([H|T],N) :- size(T,N1), N is N1+1}
(size '(1 2 3 4) 'n) ;; here we are back to lisp code
Such a concept is syntactically constant. You don't 'pass around' structure at runtime. Before runtime (read/macro/compile time) is another matter though so where possible it is often better to keep things as lists.
[edit]
The original source seems to have gone, but here is another record of the interview: https://gist.github.com/rduplain/c474a80d173e6ae78980b91bc92f43d1#file-code-quarterly-rich-hickey-2011-md

Are Lisp forms and Lisp expressions same thing?

Some literature say "the first subform of the following form..." or "to evaluate a form..." while some other literature say "To evaluate an expression...", and most literature seem to use both terms. Are the two terms interchangeable? Is there a difference in meaning?
Summary
A form is Lisp code as data. An expression is data as text.
See the Glossary entries in the Common Lisp standard:
form
expression
Explanation
In Common Lisp form and expression have two different meanings and it is useful to understand the difference.
A form is an actual data object inside a running Lisp system. The form is valid input for the Lisp evaluator.
EVAL takes a form as an argument.
The syntax is:
eval form => result*
EVAL does not get textual input in the form of Lisp expressions. It gets forms. Which is Lisp data: numbers, strings, symbols, programs as lists, ...
CL-USER 103 > (list '+ 1 2)
(+ 1 2)
Above constructs a Lisp form: here a list with the symbol + as the first element and the numbers 1 and 2 as the next elements. + names a function and the two numbers are the arguments. So it is a valid function call.
CL-USER 104 > (eval (list '+ 1 2))
3
Above gives the form (+ 1 2) as data objects to EVAL and computes a result. We can not see forms directly - we can let the Lisp system create printed representations for us.
The form is really a Lisp expression as a data object.
This is slightly unusual, since most programming languages are defined by describing textual input. Common Lisp describes data input to EVAL. Forms as data structures.
The following creates a Lisp form when evaluated:
"foo" ; strings evaluate to themselves
'foo ; that evaluates to a symbol, which then denotes a variable
123
(list '+ 1 2) ; evaluates to a list, which describes a function call
'(+ 1 2) ; evaluates to a list, which describes a function call
Example use:
CL-USER 105 > (defparameter foo 42)
FOO
CL-USER 106 > (eval 'foo)
42
The following are not creating valid forms:
'(1 + 2) ; Lisp expects prefix form
(list 1 '+ 2) ; Lisp expects prefix form
'(defun foo 1 2)' ; Lisp expects a parameter list as third element
Example:
CL-USER 107 > (eval '(1 + 2))
Error: Illegal argument in functor position: 1 in (1 + 2).
The expression is then usually used for a textual version of Lisp data object - which is not necessarily code. Expressions are read by the Lisp reader and created by the Lisp printer.
If you see Lisp data on your screen or a piece of paper, then it is an expression.
(1 + 2) ; is a valid expression in a text, `READ` can read it.
The definitions and uses of these terms vary by Lisp dialect and community, so there is no clear answer to your question for Lisps in general.
For their use in Common Lisp, see Rainers detailed answer. To give a short summary:
The HyperSpec entry for form:
form n. 1. any object meant to be evaluated. 2. a symbol, a compound
form, or a self-evaluating object. 3. (for an operator, as in
<<operator>> form'') a compound form having that operator as its
first element.A quote form is a constant form.''
The HyperSpec entry for expression:
expression n. 1. an object, often used to emphasize the use of the
object to encode or represent information in a specialized format,
such as program text. The second expression in a let form is a list
of bindings.'' 2. the textual notation used to notate an object in a
source file.The expression 'sample is equivalent to (quote
sample).''
So, according to the HyperSpec, expression is used for the (textual) representation, while form is used for Lisp objects to be evaluated. But, as I said above, this is only the definition of those terms in the context of the HyperSpec (and thus Common Lisp).
In Scheme, however, the R5RS doesn't mention form at all, and talks about expressions only. The R6RS even gives a definition that almost sounds like the exact opposite of the above:
At the purely syntactical level, both are forms, and form is the
general name for a syntactic part of a Scheme program.
(Talking about the difference between (define …) and (* …).)
This is by no means a scientific or standards-based answer, but the distinction that I have built up in my own head based on things I've heard is more along the lines of: an expression is a form which will be (or can be) evaluated in the final program.
So for instance, consider the form (lambda (x) (+ x 1)). It is a list of three elements: the symbol lambda, the list (x), and the list (+ x 1). All of those elements are forms, but only the last is an expression, because it is "intended" for evaluation; the first two forms are shuffled around by the macroexpander but never evaluated. The outermost form (lambda (x) (+ x 1)) is itself an expression as well.
This seems to me to be an interesting distinction, but it does mean it is context-sensitive: (x) is always a form, and may or may not be an expression depending on context.

Why does Clojure allow (eval 3) although 3 is not quoted?

I'm learning Clojure and trying to understand reader, quoting, eval and homoiconicity by drawing parallels to Python's similar features.
In Python, one way to avoid (or postpone) evaluation is to wrap the expression between quotes, eg. '3 + 4'. You can evaluate this later using eval, eg. eval('3 + 4') yielding 7. (If you need to quote only Python values, you can use repr function instead of adding quotes manually.)
In Lisp you use quote or ' for quoting and eval for evaluating, eg. (eval '(+ 3 4)) yielding 7.
So in Python the "quoted" stuff is represented by a string, whereas in Lisp it's represented by a list which has quoteas first item.
My question, finally: why does Clojure allow (eval 3) although 3 is not quoted? Is it just the matter of Lisp style (trying to give an answer instead of error wherever possible) or are there some other reasons behind it? Is this behavior essential to Lisp or not?
The short answer would be that numbers (and symbols, and strings, for example) evaluate to themselves. Quoting instruct lisp (the reader) to pass unevaluated whatever follows the quote. eval then gets that list as you wrote it, but without the quote, and then evaluates it (in the case of (eval '(+ 3 4)), eval will evaluate a function call (+) over two arguments).
What happens with that last expression is the following:
When you hit enter, the expression is evaluated. It contain a normal function call (eval) and some arguments.
The arguments are evaluated. The first argument contains a quote, which tells the reader to produce what is after the quote (the actual (+ 3 4) list).
There are no more arguments, and the actual function call is evaluated. This means calling the eval function with the list (+ 3 4) as argument.
The eval function does the same steps again, finding the normal function + and the arguments, and applies it, obtaining the result.
Other answers have explained the mechanics, but I think the philosophical point is in the different ways lisp and python look at "code". In python, the only way to represent code is as a string, so of course attempting to evaluate a non-string will fail. Lisp has richer data structures for code: lists, numbers, symbols, and so forth. So the expression (+ 1 2) is a list, containing a symbol and two numbers. When evaluating a list, you must first evaluate each of its elements.
So, it's perfectly natural to need to evaluate a number in the ordinary course of running lisp code. To that end, numbers are defined to "evaluate to themselves", meaning they are the same after evaluation as they were before: just a number. The eval function applies the same rules to the bare "code snippet" 3 that the compiler would apply when compiling, say, the third element of a larger expression like (+ 5 3). For numbers, that means leaving it alone.
What should 3 evaluate to? It makes the most sense that Lisp evaluates a number to itself. Would we want to require numbers to be quoted in code? That would not be very convenient and extremely problematic:
Instead of
(defun add-fourtytwo (n)
(+ n 42))
we would have to write
(defun add-fourtytwo (n)
(+ n '42))
Every number in code would need to be quoted. A missing quote would trigger an error. That's not something one would want to use.
As a side note, imagine what happens when you want to use eval in your code.
(defun example ()
(eval 3))
Above would be wrong. Numbers would need to be quoted.
(defun example ()
(eval '3))
Above would be okay, but generating an error at runtime. Lisp evaluates '3 to the number 3. But then calling eval on the number would be an error, since they need to be quoted.
So we would need to write:
(defun example ()
(eval ''3))
That's not very useful...
Numbers have be always self-evaluating in Lisp history. But in earlier Lisp implementations some other data objects, like arrays, were not self-evaluating. Again, since this is a huge source of errors, Lisp dialects like Common Lisp have defined that all data types (other than lists and symbols) are self-evaluating.
To answer this question we need to look at eval definition in lisp. E.g. in CLHS there is definition:
Syntax: eval form => result*
Arguments and Values:
form - a form.
results - the values yielded by the evaluation of form.
Where form is
any object meant to be evaluated.
a symbol, a compound form, or a self-evaluating object.
(for an operator, as in <<operator>> form'') a compound form having that operator as its first element.A quote form is a
constant form.''
In your case number "3" is self-evaluating object. Self-evaluating object is a form that is neither a symbol nor a cons is defined to be a self-evaluating object. I believe that for clojure we can just replace cons by list in this definition.
In clojure only lists are interpreted by eval as function calls. Other data structures and objects are evaluated as self-evaluating objects.
'(+ 3 4) is equal to (list '+ 3 4). ' (transformed by reader to quote function) just avoid evaluation of given form. So in expression (eval '(+ 3 4)) eval takes list data structure ('+ 3 4) as argument.

Lisp: list vs S-expression

I'm new to Lisp. I encountered 2 terms "list" and "S-expression". I just can't distinguish between them. Are they just synonyms in Lisp?
First, not all S-expressions represent lists; an expression such as foobar, representing a bare atom, is also considered an S-expression. As is the "cons cell" syntax, (car . cons), used when the "cons" part is not itself another list (or nil). The more familiar list expression, such as (a b c d), is just syntactic sugar for a chain of nested cons cells; that example expands to (a . (b . (c . (d . nil)))).
Second, the term "S-expression" refers to the syntax - (items like this (possibly nested)). Such an S-expression is the representation in Lisp source code of a list, but it's not technically a list itself. This distinction is the same as that between a sequence of decimal digits and their numeric value, or between a sequence of characters within quotation marks and the resulting string.
That is perhaps an overly technical distinction; programmers routinely refer to literal representations of values as though they were the values themselves. But with Lisp and lists, things get a little trickier because everything in a Lisp program is technically a list.
For example, consider this expression:
(+ 1 2)
The above is a straightforward S-expression which represents a flat list, consisting of the atoms +, 1, and 2.
However, within a Lisp program, such a list will be interpreted as a call to the + function with 1 and 2 as arguments. (Do note that is the list, not the S-expression, that is so interpreted; the evaluator is handed lists that have been pre-parsed by the reader, not source code text.)
So while the above S-expression represents a list, it would only rarely be referred to as a "list" in the context of a Lisp program. Unless discussing macros, or the inner workings of the reader, or engaged in a metasyntactic discussion because of some other code-generation or parsing context, a typical Lisp programmer would instead treat the above as a numeric expression.
On the other hand, any of the following S-expressions likely would be referred to as "lists", because evaluating them as Lisp code would produce the list represented by the above literal S-expression as a runtime value:
'(+ 1 2)
(quote (+ 1 2))
(list '+ 1 2)
Of course, the equivalence of code and data is one of the cool things about Lisp, so the distinction is fluid. But my point is that while all of the above are S-expressions and lists, only some would be referred to as "lists" in casual Lisp-speak.
S-expressions are a notation for data.
Historically an s-expression (short for symbolic expression) is described as:
symbols like FOO and BAR
cons cells with s-expressions as its first and second element : ( expression-1 . expression-2 )
the list termination symbol NIL
and a convention to write lists: ( A . ( B . NIL ) ) is simpler written as the list (A B)
Note also that historically program text was written differently. An example for the function ASSOC.
assoc[x;y] =
eq[caar[y];x] -> cadar[y];
T -> assoc[x;cdr[y]]
Historically there existed also a mapping from these m-expressions (short for meta expressions) to s-expressions. Today most Lisp program code is written using s-expressions.
This is described here: McCarthy, Recursive Functions of Symbolic Expressions
In a Lisp programming language like Common Lisp nowadays s-expressions have more syntax and can encode more data types:
Symbols: symbol123, |This is a symbol with spaces|
Numbers: 123, 1.0, 1/3, ...
Strings: "This is a string"
Characters: #\a, #\space
Vectors: #(a b c)
Conses and lists: ( a . b ), (a b c)
Comments: ; this is a comment, #| this is a comment |#
and more.
Lists
A list is a data structure. It consists of cons cells and a list end marker. Lists have in Lisp a notation as lists in s-expressions. You could use some other notations for lists, but in Lisp one has settled on the s-expression syntax to write them.
Side note: programs and forms
In a programming language like Common Lisp, the expressions of the programming language are not text, but data! This is different from many other programming languages. Expressions in the programming language Common Lisp are called Lisp forms.
For example a function call is Lisp data, where the call is a list with a function symbol as its first element and the next elements are its arguments.
We can write that as (sin 3.0). But it really is data. Data we can also construct.
The function to evaluate Lisp forms is called EVAL and it takes Lisp data, not program text or strings of program text. Thus you can construct programs using Lisp functions which return Lisp data: (EVAL (LIST 'SIN 3.0)) evaluates to 0.14112.
Since Lisp forms have a data representation, they are usually written using the external data representation of Lisp - which is what? - s-expressions!
It is s-expressions. Lisp forms as Lisp data are written externally as s-expression.
You should first understand main Lisp feature - program can be manipulated as data. Unlike other languages (like C or Java), where you write program by using special syntax ({, }, class, define, etc.), in Lisp you write code as (nested) lists (btw, this allows to express abstract syntactic trees directly). Once again: you write programs that look just like language's data structures.
When you talk about it as data, you call it "list", but when you talk about program code, you should better use term "s-expression". Thus, technically they are similar, but used in different contexts. The only real place where these terms are mixed is meta-programming (normally with macros).
Also note that s-expression may also consist of the only atom (like numbers, strings, etc.).
A simple definition for an S-expression is
(define S-expression?
(λ (object)
(or (atom? object) (list? object))))
;; Where atom? is:
(define atom?
(λ (object)
(and (not (pair? object)) (not (null? object)))))
;; And list? is:
(define list? (λ (object)
(let loop ((l1 object) (l2 object))
(if (pair? l1)
(let ((l1 (cdr l1)))
(cond ((eq? l1 l2) #f)
((pair? l1) (loop (cdr l1) (cdr l2)))
(else (null? l1))))
(null? l1)))))
Both are written in similar way: (blah blah blah), may be nested. with one difference - lists are prefixed with apostrophe.
On evaluation:
S-expression returns some result (may be an atom or list or nil or whatever)
Lists return Lists
If we need, we can convert lists to s-exp and vice versa.
(eval '(blah blah blah)) => list is treated as an s-exp and a result is returned.
(quote (blah blah blah)) => sexp is converted to list and the list is returned without evaluating
IAS:
If a List is treated as data it is called List, if it is treated as code it is called s-exp.

Using quote in Clojure

Quoting in clojure results in non-evaluation. ':a and :a return the same result. What is the difference between ':a and :a ? One is not evaluated and other evaluates to itself... but is this same as non-evaluation ?
':a is shorthand for (quote :a).
(eval '(quote form)) returns form by definition. That is to say, if the Clojure function eval receives as its argument a list structure whose first element is the symbol quote, it returns the second element of said list structure without transforming it in any way (thus it is said that the quoted form is not evaluated). In other words, the behaviour eval dispatches to when its argument is a list structure of the form (quote foo) is that of returning foo unchanged, regardless of what it is.
When you write down the literal :a in your programme, it gets read in as the keyword :a; that is, the concrete piece of text :a gets converted to an in-memory data structure which happens to be called the :a keyword (Lisp being homoiconic means that occasionally it is hard to distinguish between the textual representation of Lisp data and the data itself, even when this would be useful for explanatory purposes...).
The in-memory data structure corresponding to the literal :a is a Java object which exposes a number of methods etc. and which has the interesting property that the function eval, when it receives this data object as an argument, returns it unchanged. In other words, the keyword's "evaluation to itself" which you ask about is just the behaviour eval dispatches to when passed in a keyword as an argument.
Thus when eval sees ':a, it treats it as a quoted form and returns the second part thereof, which happens to be :a. When, on the other hand, eval sees :a, it treats it as a keyword and returns it unchanged. The return value is the same in both cases (it's just the keyword :a); the evaluation process is slightly different.
Clojure semantics -- indeed Lisp semantics, for any dialect of Lisp -- are specified in terms of the values returned by and side-effects caused by the function eval when it receives various Lisp data structures as arguments. Thus the above explains what's actually meant to happen when you write down ':a or :a in your programme (code like (println :a) may get compiled into efficient bytecode which doesn't actually code the function eval, of course; but the semantics are always preserved, so that it still acts as if it was eval receiving a list structure containing the symbol println and the keyword :a).
The key idea here is that regardless of whether the form being evaluated is ':a or :a, the keyword data structure is constructed at read time; then when one of these forms is evaluated, that data structure is returned unchanged -- although for different reasons.