Lisp: list vs S-expression - list

I'm new to Lisp. I encountered 2 terms "list" and "S-expression". I just can't distinguish between them. Are they just synonyms in Lisp?

First, not all S-expressions represent lists; an expression such as foobar, representing a bare atom, is also considered an S-expression. As is the "cons cell" syntax, (car . cons), used when the "cons" part is not itself another list (or nil). The more familiar list expression, such as (a b c d), is just syntactic sugar for a chain of nested cons cells; that example expands to (a . (b . (c . (d . nil)))).
Second, the term "S-expression" refers to the syntax - (items like this (possibly nested)). Such an S-expression is the representation in Lisp source code of a list, but it's not technically a list itself. This distinction is the same as that between a sequence of decimal digits and their numeric value, or between a sequence of characters within quotation marks and the resulting string.
That is perhaps an overly technical distinction; programmers routinely refer to literal representations of values as though they were the values themselves. But with Lisp and lists, things get a little trickier because everything in a Lisp program is technically a list.
For example, consider this expression:
(+ 1 2)
The above is a straightforward S-expression which represents a flat list, consisting of the atoms +, 1, and 2.
However, within a Lisp program, such a list will be interpreted as a call to the + function with 1 and 2 as arguments. (Do note that is the list, not the S-expression, that is so interpreted; the evaluator is handed lists that have been pre-parsed by the reader, not source code text.)
So while the above S-expression represents a list, it would only rarely be referred to as a "list" in the context of a Lisp program. Unless discussing macros, or the inner workings of the reader, or engaged in a metasyntactic discussion because of some other code-generation or parsing context, a typical Lisp programmer would instead treat the above as a numeric expression.
On the other hand, any of the following S-expressions likely would be referred to as "lists", because evaluating them as Lisp code would produce the list represented by the above literal S-expression as a runtime value:
'(+ 1 2)
(quote (+ 1 2))
(list '+ 1 2)
Of course, the equivalence of code and data is one of the cool things about Lisp, so the distinction is fluid. But my point is that while all of the above are S-expressions and lists, only some would be referred to as "lists" in casual Lisp-speak.

S-expressions are a notation for data.
Historically an s-expression (short for symbolic expression) is described as:
symbols like FOO and BAR
cons cells with s-expressions as its first and second element : ( expression-1 . expression-2 )
the list termination symbol NIL
and a convention to write lists: ( A . ( B . NIL ) ) is simpler written as the list (A B)
Note also that historically program text was written differently. An example for the function ASSOC.
assoc[x;y] =
eq[caar[y];x] -> cadar[y];
T -> assoc[x;cdr[y]]
Historically there existed also a mapping from these m-expressions (short for meta expressions) to s-expressions. Today most Lisp program code is written using s-expressions.
This is described here: McCarthy, Recursive Functions of Symbolic Expressions
In a Lisp programming language like Common Lisp nowadays s-expressions have more syntax and can encode more data types:
Symbols: symbol123, |This is a symbol with spaces|
Numbers: 123, 1.0, 1/3, ...
Strings: "This is a string"
Characters: #\a, #\space
Vectors: #(a b c)
Conses and lists: ( a . b ), (a b c)
Comments: ; this is a comment, #| this is a comment |#
and more.
Lists
A list is a data structure. It consists of cons cells and a list end marker. Lists have in Lisp a notation as lists in s-expressions. You could use some other notations for lists, but in Lisp one has settled on the s-expression syntax to write them.
Side note: programs and forms
In a programming language like Common Lisp, the expressions of the programming language are not text, but data! This is different from many other programming languages. Expressions in the programming language Common Lisp are called Lisp forms.
For example a function call is Lisp data, where the call is a list with a function symbol as its first element and the next elements are its arguments.
We can write that as (sin 3.0). But it really is data. Data we can also construct.
The function to evaluate Lisp forms is called EVAL and it takes Lisp data, not program text or strings of program text. Thus you can construct programs using Lisp functions which return Lisp data: (EVAL (LIST 'SIN 3.0)) evaluates to 0.14112.
Since Lisp forms have a data representation, they are usually written using the external data representation of Lisp - which is what? - s-expressions!
It is s-expressions. Lisp forms as Lisp data are written externally as s-expression.

You should first understand main Lisp feature - program can be manipulated as data. Unlike other languages (like C or Java), where you write program by using special syntax ({, }, class, define, etc.), in Lisp you write code as (nested) lists (btw, this allows to express abstract syntactic trees directly). Once again: you write programs that look just like language's data structures.
When you talk about it as data, you call it "list", but when you talk about program code, you should better use term "s-expression". Thus, technically they are similar, but used in different contexts. The only real place where these terms are mixed is meta-programming (normally with macros).
Also note that s-expression may also consist of the only atom (like numbers, strings, etc.).

A simple definition for an S-expression is
(define S-expression?
(λ (object)
(or (atom? object) (list? object))))
;; Where atom? is:
(define atom?
(λ (object)
(and (not (pair? object)) (not (null? object)))))
;; And list? is:
(define list? (λ (object)
(let loop ((l1 object) (l2 object))
(if (pair? l1)
(let ((l1 (cdr l1)))
(cond ((eq? l1 l2) #f)
((pair? l1) (loop (cdr l1) (cdr l2)))
(else (null? l1))))
(null? l1)))))

Both are written in similar way: (blah blah blah), may be nested. with one difference - lists are prefixed with apostrophe.
On evaluation:
S-expression returns some result (may be an atom or list or nil or whatever)
Lists return Lists
If we need, we can convert lists to s-exp and vice versa.
(eval '(blah blah blah)) => list is treated as an s-exp and a result is returned.
(quote (blah blah blah)) => sexp is converted to list and the list is returned without evaluating
IAS:
If a List is treated as data it is called List, if it is treated as code it is called s-exp.

Related

Are Lisp forms and Lisp expressions same thing?

Some literature say "the first subform of the following form..." or "to evaluate a form..." while some other literature say "To evaluate an expression...", and most literature seem to use both terms. Are the two terms interchangeable? Is there a difference in meaning?
Summary
A form is Lisp code as data. An expression is data as text.
See the Glossary entries in the Common Lisp standard:
form
expression
Explanation
In Common Lisp form and expression have two different meanings and it is useful to understand the difference.
A form is an actual data object inside a running Lisp system. The form is valid input for the Lisp evaluator.
EVAL takes a form as an argument.
The syntax is:
eval form => result*
EVAL does not get textual input in the form of Lisp expressions. It gets forms. Which is Lisp data: numbers, strings, symbols, programs as lists, ...
CL-USER 103 > (list '+ 1 2)
(+ 1 2)
Above constructs a Lisp form: here a list with the symbol + as the first element and the numbers 1 and 2 as the next elements. + names a function and the two numbers are the arguments. So it is a valid function call.
CL-USER 104 > (eval (list '+ 1 2))
3
Above gives the form (+ 1 2) as data objects to EVAL and computes a result. We can not see forms directly - we can let the Lisp system create printed representations for us.
The form is really a Lisp expression as a data object.
This is slightly unusual, since most programming languages are defined by describing textual input. Common Lisp describes data input to EVAL. Forms as data structures.
The following creates a Lisp form when evaluated:
"foo" ; strings evaluate to themselves
'foo ; that evaluates to a symbol, which then denotes a variable
123
(list '+ 1 2) ; evaluates to a list, which describes a function call
'(+ 1 2) ; evaluates to a list, which describes a function call
Example use:
CL-USER 105 > (defparameter foo 42)
FOO
CL-USER 106 > (eval 'foo)
42
The following are not creating valid forms:
'(1 + 2) ; Lisp expects prefix form
(list 1 '+ 2) ; Lisp expects prefix form
'(defun foo 1 2)' ; Lisp expects a parameter list as third element
Example:
CL-USER 107 > (eval '(1 + 2))
Error: Illegal argument in functor position: 1 in (1 + 2).
The expression is then usually used for a textual version of Lisp data object - which is not necessarily code. Expressions are read by the Lisp reader and created by the Lisp printer.
If you see Lisp data on your screen or a piece of paper, then it is an expression.
(1 + 2) ; is a valid expression in a text, `READ` can read it.
The definitions and uses of these terms vary by Lisp dialect and community, so there is no clear answer to your question for Lisps in general.
For their use in Common Lisp, see Rainers detailed answer. To give a short summary:
The HyperSpec entry for form:
form n. 1. any object meant to be evaluated. 2. a symbol, a compound
form, or a self-evaluating object. 3. (for an operator, as in
<<operator>> form'') a compound form having that operator as its
first element.A quote form is a constant form.''
The HyperSpec entry for expression:
expression n. 1. an object, often used to emphasize the use of the
object to encode or represent information in a specialized format,
such as program text. The second expression in a let form is a list
of bindings.'' 2. the textual notation used to notate an object in a
source file.The expression 'sample is equivalent to (quote
sample).''
So, according to the HyperSpec, expression is used for the (textual) representation, while form is used for Lisp objects to be evaluated. But, as I said above, this is only the definition of those terms in the context of the HyperSpec (and thus Common Lisp).
In Scheme, however, the R5RS doesn't mention form at all, and talks about expressions only. The R6RS even gives a definition that almost sounds like the exact opposite of the above:
At the purely syntactical level, both are forms, and form is the
general name for a syntactic part of a Scheme program.
(Talking about the difference between (define …) and (* …).)
This is by no means a scientific or standards-based answer, but the distinction that I have built up in my own head based on things I've heard is more along the lines of: an expression is a form which will be (or can be) evaluated in the final program.
So for instance, consider the form (lambda (x) (+ x 1)). It is a list of three elements: the symbol lambda, the list (x), and the list (+ x 1). All of those elements are forms, but only the last is an expression, because it is "intended" for evaluation; the first two forms are shuffled around by the macroexpander but never evaluated. The outermost form (lambda (x) (+ x 1)) is itself an expression as well.
This seems to me to be an interesting distinction, but it does mean it is context-sensitive: (x) is always a form, and may or may not be an expression depending on context.

Why does Clojure allow (eval 3) although 3 is not quoted?

I'm learning Clojure and trying to understand reader, quoting, eval and homoiconicity by drawing parallels to Python's similar features.
In Python, one way to avoid (or postpone) evaluation is to wrap the expression between quotes, eg. '3 + 4'. You can evaluate this later using eval, eg. eval('3 + 4') yielding 7. (If you need to quote only Python values, you can use repr function instead of adding quotes manually.)
In Lisp you use quote or ' for quoting and eval for evaluating, eg. (eval '(+ 3 4)) yielding 7.
So in Python the "quoted" stuff is represented by a string, whereas in Lisp it's represented by a list which has quoteas first item.
My question, finally: why does Clojure allow (eval 3) although 3 is not quoted? Is it just the matter of Lisp style (trying to give an answer instead of error wherever possible) or are there some other reasons behind it? Is this behavior essential to Lisp or not?
The short answer would be that numbers (and symbols, and strings, for example) evaluate to themselves. Quoting instruct lisp (the reader) to pass unevaluated whatever follows the quote. eval then gets that list as you wrote it, but without the quote, and then evaluates it (in the case of (eval '(+ 3 4)), eval will evaluate a function call (+) over two arguments).
What happens with that last expression is the following:
When you hit enter, the expression is evaluated. It contain a normal function call (eval) and some arguments.
The arguments are evaluated. The first argument contains a quote, which tells the reader to produce what is after the quote (the actual (+ 3 4) list).
There are no more arguments, and the actual function call is evaluated. This means calling the eval function with the list (+ 3 4) as argument.
The eval function does the same steps again, finding the normal function + and the arguments, and applies it, obtaining the result.
Other answers have explained the mechanics, but I think the philosophical point is in the different ways lisp and python look at "code". In python, the only way to represent code is as a string, so of course attempting to evaluate a non-string will fail. Lisp has richer data structures for code: lists, numbers, symbols, and so forth. So the expression (+ 1 2) is a list, containing a symbol and two numbers. When evaluating a list, you must first evaluate each of its elements.
So, it's perfectly natural to need to evaluate a number in the ordinary course of running lisp code. To that end, numbers are defined to "evaluate to themselves", meaning they are the same after evaluation as they were before: just a number. The eval function applies the same rules to the bare "code snippet" 3 that the compiler would apply when compiling, say, the third element of a larger expression like (+ 5 3). For numbers, that means leaving it alone.
What should 3 evaluate to? It makes the most sense that Lisp evaluates a number to itself. Would we want to require numbers to be quoted in code? That would not be very convenient and extremely problematic:
Instead of
(defun add-fourtytwo (n)
(+ n 42))
we would have to write
(defun add-fourtytwo (n)
(+ n '42))
Every number in code would need to be quoted. A missing quote would trigger an error. That's not something one would want to use.
As a side note, imagine what happens when you want to use eval in your code.
(defun example ()
(eval 3))
Above would be wrong. Numbers would need to be quoted.
(defun example ()
(eval '3))
Above would be okay, but generating an error at runtime. Lisp evaluates '3 to the number 3. But then calling eval on the number would be an error, since they need to be quoted.
So we would need to write:
(defun example ()
(eval ''3))
That's not very useful...
Numbers have be always self-evaluating in Lisp history. But in earlier Lisp implementations some other data objects, like arrays, were not self-evaluating. Again, since this is a huge source of errors, Lisp dialects like Common Lisp have defined that all data types (other than lists and symbols) are self-evaluating.
To answer this question we need to look at eval definition in lisp. E.g. in CLHS there is definition:
Syntax: eval form => result*
Arguments and Values:
form - a form.
results - the values yielded by the evaluation of form.
Where form is
any object meant to be evaluated.
a symbol, a compound form, or a self-evaluating object.
(for an operator, as in <<operator>> form'') a compound form having that operator as its first element.A quote form is a
constant form.''
In your case number "3" is self-evaluating object. Self-evaluating object is a form that is neither a symbol nor a cons is defined to be a self-evaluating object. I believe that for clojure we can just replace cons by list in this definition.
In clojure only lists are interpreted by eval as function calls. Other data structures and objects are evaluated as self-evaluating objects.
'(+ 3 4) is equal to (list '+ 3 4). ' (transformed by reader to quote function) just avoid evaluation of given form. So in expression (eval '(+ 3 4)) eval takes list data structure ('+ 3 4) as argument.

What are the tasks of the "reader" during Lisp interpretation?

I'm wondering about the purpose, or perhaps more correctly, the tasks of the "reader" during interpretation/compilation of Lisp programs.
From the pre-question-research I've just done, it seems to me that a reader (particular to Clojure in this case) can be thought of as a "syntactic preprocessor". It's main duties are the expansion of reader macros and primitive forms. So, two examples:
'cheese --> (quote cheese)
{"a" 1 "b" 2} --> (array-map "a" 1 "b" 2)
So the reader takes in the text of a program (consisting of S-Expressions) and then builds and returns an in-memory data-structure that can be evaluated directly.
How far from the truth is this (and have I over-simplified the whole process)? What other tasks does the reader perform? Considering a virtue of Lisps is their homoiconicity (code as data), why is there a need for lexical analysis (if such is indeed comparable to the job of the reader)?
Thanks!
Generally the reader in Lisp reads s-expressions and returns data structures. READ is an I/O operation: Input is a stream of characters and output is Lisp data.
The printer does the opposite: it takes Lisp data and outputs those as a stream of characters. Thus it can also print Lisp data to external s-expressions.
Note that interpretation means something specific: executing code by an Interpreter. But many Lisp systems (including Clojure) are using a compiler. The tasks of computing a value for a Lisp form is usually called evaluation. Evaluation can be implemented by interpretation, by compilation or by a mix of both.
S-Expression: symbolic expressions. External, textual representation of data. External means that s-expressions are what you see in text files, strings, etc. So s-expressions are made of characters on some, typically external, medium.
Lisp data structures: symbols, lists, strings, numbers, characters, ...
Reader: reads s-expressions and returns Lisp data structures.
Note that s-expressions also are used to encode Lisp source code.
In some Lisp dialects the reader is programmable and table driven (via the so-called read table). This read table contains reader functions for characters. For example the quote ' character is bound to a function that reads an expression and returns the value of (list 'quote expression). The number characters 0..9 are bound to functions that read a number (in reality this might be more complex, since some Lisps allow numbers to be read in different bases).
S-expressions provide the external syntax for data structures.
Lisp programs are written in external form using s-expressions. But not all s-expressions are valid Lisp programs:
(if a b c d e) is usually not a valid Lisp program
the syntax of Lisp usually is defined on top of Lisp data.
IF has for example the following syntax (in Common Lisp http://www.lispworks.com/documentation/HyperSpec/Body/s_if.htm ):
if test-form then-form [else-form]
So it expects a test-form, a then-form and an optional else-form.
As s-expressions the following are valid IF expressions:
(if (foo) 1 2)
(if (bar) (foo))
But since Lisp programs are forms, we can also construct these forms using Lisp programs:
(list 'if '(foo) 1 2) is a Lisp program that returns a valid IF form.
CL-USER 24 > (describe (list 'if '(foo) 1 2))
(IF (FOO) 1 2) is a LIST
0 IF
1 (FOO)
2 1
3 2
This list can for example be executed with EVAL. EVAL expects list forms - not s-expressions. Remember s-expressions are only an external representation. To create a Lisp form, we need to READ it.
This is why it is said code is data. Lisp forms are expressed as internal Lisp data structures: lists, symbols, numbers, strings, .... In most other programming languages code is raw text. In Lisp s-expressions are raw text. When read with the function READ, s-expressions are turned into data.
Thus the basic interaction top-level in Lisp is called REPL, Read Eval Print Loop.
It is a LOOP that repeatedly reads an s-expression, evaluates the lisp form and prints it:
READ : s-expression -> lisp data
EVAL : lisp form -> resulting lisp data
PRINT: lisp data -> s-expression
So the most primitive REPL is:
(loop (print (eval (read))))
Thus from a conceptual point of view, to answer your question, during evaluation the reader does nothing. It is not involved in evaluation. Evaluation is done by the function EVAL. The reader is invoked by a call to READ. Since EVAL uses Lisp data structures as input (and not s-expressions) the reader is run before the Lisp form gets evaluated (for example by interpretation or by compiling and executing it).

Scheme and Clojure don't have the atom type predicate - is this by design?

Common LISP and Emacs LISP have the atom type predicate. Scheme and Clojure don't have it. http://hyperpolyglot.wikidot.com/lisp
Is there a design reason for this - or is it just not an essential function to include in the API?
In Clojure, the atom predicate isn't so important because Clojure emphasizes various other types of (immutable) data structures rather than focusing on cons cells / lists.
It could also cause confusion. How would you expect this function to behave when given a hashmap, a set or a vector for example? Or a Java object that represents some complex mutable data structure?
Also the name "atom" is used for something completely different - it's one of Clojure's core concurrency mechanisms to manage shared, synchronous, independent state.
Clojure has the coll? (collection?) function, which is (sort of) the inverse of atom?.
In the book The Little Schemer, atom? is defined as follows:
(define (atom? x)
(and (not (pair? x))
(not (null? x))))
Noting that null is not considered an atom, as other answers have suggested. In the mentioned book atom? is used heavily, in particular when writing procedures that deal with lists of lists.
In the entire IronScheme standard libraries which implement R6RS, I never needed such a function.
In summary:
It is useless
It is easy enough to write if you need it
Which pretty much follows Scheme's minimalistic approach.
In Scheme anything that is not a pair is an atom. As Scheme already defines the predicate pair?, the atom? predicate is not needed, as it is so trivial to define:
(define (atom? s)
(not (pair? s)))
It's a trivial function:
(defun atom (x)
(not (consp x)))
It is used in list processing, when the Lisp dialect uses conses to build lists. There are some 'Lisps' for which this is not the case or not central.
Atom is either a symbol, a character, a number, or null.
(define (atom? a)
(or (symbol? a)
(char? a)
(number? a)
(null? a)))
I think those are all the atoms that exist, if you find more add to the conditional expression. For example, if you think a string is an atom, add (string? a), :-). The absence of a definition for atom, allows you to define it the way you want. After all, Scheme does not know what an atom is.
In Lisp nil is an atom, so I've made null an atom. nil is also a list by simplification nil = (nil . nil), the same way the integral numbers are rational numbers by simplification, 2 = 2/1, 2 is an integral number, 2/1 is a rational number, as both are equals by simplification of the rational one; one says the integral number 2 is also a rational number. But the list predicate is already defined in Scheme, nothing to worry about.
About the question. As long as I am concerned Scheme has predicates only for class types, atom is not a class type, atom is an abstraction that incorporates several class types. Maybe that is the reason. But pair is not a class type either, but it does not incorporate several class types, and yet some may consider pair as a class type.
Atom means that a certain thing is not a compound thing. One reason not to include such a predicate is when the language allows you to define atomic types, so the pletora of atoms can grow wider and wider, and such a predicate would make no sense. I don't know if Scheme allows for this. I can only say that Scheme predicates (the built-in ones) are all specific. You can ask, is this an apple?, is this an orange?; but you cannot ask is this a fruit?. :-). Well, you can, if you do it yourself. Despite what a said, Scheme has a general predicate number?, and number specific predicates, integer?, rational?, real?; notwithstanding, number can be thought of as a class type (the other predicates refer to sub-types of number), whereas atom is not (at least in Scheme).
Note:
class types: types that belong to a certain class of things. Example:
number, integer, real, rational, character, procedure, list, vector, string, etc.

common lisp cons creates a list from two symbols, clojure cons requires a seq to cons onto?

(Disclaimer - I'm aware of the significance of Seqs in Clojure)
In common lisp the cons function can be used to combine two symbols into a list:
(def s 'x)
(def l 'y)
(cons s l)
In clojure - you can only cons onto a sequence - cons hasn't been extended to work with two symbols. So you have to write:
(def s 'x)
(def l 'y)
(cons s '(l))
Is there a higher level pattern in Clojure that explains this difference between Common LISP and Clojure?
In Clojure, unlike traditional Lisps, lists are not the primary data structures. The data structures can implement the ISeq interface - which is another view of the data structure it's given - allowing the same functions to access elements in each. (Lists already implement this. seq? checks whether something implements ISeq.(seq? '(1 2)), (seq? [1 2])) Clojure simply acts differently (with good reason), in that when cons is used, a sequence (it's actually of type clojure.lang.Cons) constructed of a and (seq b) is returned. (a being arg 1 and b arg 2) Obviously, symbols don't and can't implement ISeq.
Clojure.org/sequences
Sequences screencast/talk by Rich Hickey However, note that rest has changed, and it's previous behaviour is now in next, and that lazy-cons has been replaced by lazy-seq and cons.
clojure.lang.RT
In Common Lisp CONS creates a so-called CONS cell, which is similar to a record with two slots: the 'car' and the 'cdr'.
You can put ANYTHING into those two slots of a cons cell.
Cons cells are used to build lists. But one can create all kinds of data structures with cons cells: trees, graphs, various types of specialized lists, ...
The implementations of Lisp are highly optimized to provide very efficient cons cells.
A Lisp list is just a common way of using cons cells (see Rainer's description). Clojure is best seen as not having cons cells (although something similar might hide under the hood). The Clojure cons is a misnomer, it should actually just be named prepend.
In Clojure the use of a two-element vector is preferred: [:a :b]. Under the hood such small vectors are implemented as Java arrays and are extremely simple and fast.
A short hand for (cons :a '(:b)) (or (cons :a (cons :b nil))) is list: (list :a :b).
When you say
> (cons 'a 'b)
in common lisp you dont get a list but a dotted pair: (a . b), whereas the result of
> (cons 'a (cons 'b nil))
is the dotted pair (a . ( b . nil)).
In the first list the cdr() of that is not a list, since it is here b and not nil, making it an improper list. Proper lists must be terminated by nil. Therefore higher order functions like mapcar() and friends won't work, but we save a cons-cell. I guess the designers of Clojure removed this feature because of the confusion it could cause.