Is everything a list in scheme? - list

Along with the book "Simply Scheme" (Second Edition) i'm watching the "Computer Science 61A - Lectures" on youtube. On the lectures , the tutor uses Stk interpreter, but i'm using chicken scheme interpreter.
In the first lecture he uses the "first" procedure which if it's called like :
(first 'hello)
it returns "h".
On the book of "Simply Scheme" it has an example of how first can be implemented:
(define (first sent)
(car sent))
Which to my testing and understanding works if sent is a list .
I'm trying to understand if it's proper to say that "everything is a list" in scheme.
To be more specific where's the list in 'hello and if there is one, why it doesn't work in first procedure as it's written in the book?
Also if every implementation is written with "everything is a list" in mind why the same code does not work in all scheme implementations?

No, this is a common misconception because lists are so pervasive in Scheme programming (and often functional programming in general). Most Scheme implementations come with many data types like strings, symbols, vectors, maps/tables, records, sets, bytevectors, and so on.
This code snippet (first 'hello) is unlikely to work in most Schemes because it is not valid according to the standard. The expression 'hello denotes a symbol, which is an opaque value that can't be deconstructed as a list (the main thing you do with symbols is compare them with eq?). This is probably a quirk of Stk that is unfortunately taught by your book.
See The Scheme Programming Language for a more canonical description of the language. If you just want to learn programming, I recommend HtDP.

Not everything is a list in Scheme. I'm a bit surprised that the example you're showing actually works, in other Scheme interpreters it will fail, as first is usually an alias for car, and car is defined only for cons pairs. For example, in Racket:
(first 'hello)
> first: expected argument of type <non-empty list>; given 'hello
(car 'hello)
> car: expects argument of type <pair>; given 'hello
Scheme's basic data structure is the cons pair, with it it's possible to build arbitrarily linked data structures - in particular, singly-linked lists. There are other data structures supported, like vectors and hash tables. And of course there are primitive types - booleans, symbols, numbers, strings, chars, etc. So it's erroneous to state that "everything is a list" in Scheme.

With respect to Simply Scheme: the functions first and rest are not the standard ones from the Scheme standard, nor ones that come built-into DrRacket. The Simple Scheme API is designed as part of the Simply Scheme curriculum to make it easy to work uniformly on a variety of data. We can't make too many assumptions on how the underlying, low-level implementation works from just the experience of using the Simply Scheme teaching language! There's a runtime cost involved with making things that simple: it does not come for free.

Related

Abstract structure of Clojure

I've been learning Clojure and am a good way through a book on it when I realized how much I'm still struggling to interpret the code. What I'm looking for is the abstract structure, interface, or rules, Clojure uses to parse code. I think it looks something like:
(some-operation optional-args)
optional-args can be nearly anything and that's where I start getting confused.
(operation optional-name-string [vector of optional args]) would equal (defn newfn [argA, argB])
I think this pattern holds for all lists () but with so much flexibility and variation in Clojure, I'm not sure. It would be really helpful to see the rules the interpreter follows.
You are not crazy. Sure it's easy to point out how "easy" ("simple"? but that another discussion) Clojure syntax is but there are two things for a new learner to be aware of that are not pointed out very clearly in beginning tutorials that greatly complicate understanding what you are seeing:
Destructuring. Spend some quality time with guides on destructuring in Clojure. I will say that this adds a complexity to the language and is not dissimilar from "*args" and "**kwargs" arguments in Python or from the use of the "..." spread operator in javascript. They are all complicated enough to require some dedicated time to read. This relates to the optional-args you reference above.
macros and metaprogramming. In the some-operation you reference above, you wish to "see the rules the interpreter follows". In the majority of the cases it is a function but Clojure provides you no indication of whether you are looking at a function or a macro. In the standard library, you will just need to know some standard macros and how they affect the syntax they headline. (e.g. if, defn etc). For included libraries, there will typically be a small set of macros that are core to understanding that library. Any macro will to modify, dare I say, complicate the syntax in the parens you are looking at so be on your toes.
Clojure is fantastic and easy to learn but those two points are not to be glossed over IMHO.
Before you start coding with Clojure, I highly recommend studying functional programming and LISB. In Clojure, everything is a prefix, and when you want to run and specific function, you will call it and then feed it with some arguments. for example, 1+2+3 will be (+ 1 2 3) in Clojure. In other words, every function you call will be at the start of a parenthesis, and all of its arguments will be follows the function name.
If you define a function, you may do as follow:
(defn newfunc [a1 a2]
(+ 100 a1 a2))
Which newfunc add 100 and a1 and a2. When you call it, you should do this:
(newfunc 1 2)
and the result will be 103.
in the first example, + is a function, so we call it at the beginning of the parenthesis.
Clojure is a beautiful world full of simplicity. Please learn it deeply.

What is the name of the data structure for and-or-lists (or and-or-trees) and where can I read about it?

I recently needed to make a data structure which was a nested list of and/or questions. Since most every interesting thing has been discovered by someone else previously, I’m looking for the name of this data structure. it looks something like this.
‘((a b c) (b d e) (c (a b) (f a)))
The interpretation is I want to find abc or bde or caf or caa or cbf or cba and the list encapsulates that. At the top level each item is or’ed together and sub-lists of the top level are and’ed together and sub-lists of sub-lists are or’ed again sub-lists of those are and’ed and sub-lists of those or’ed ad infinitum. Note that in my example, all the lists are the same length, in my real application the lists vary in length.
The code to walk such a “tree” is relatively simple, but I’m assuming that there is a name for that type of tree and there is stuff I can read about it.
These lists are equivalent to fixed length regular expressions (which I've seen referred to as "network expressions", but I am particularly interested in this data structure and representation thereof.
In general (in the very high level of abstraction) it is:
Context free grammar -Wiki
If you allow it to be infinitely nested, then it is not a regular expression because of presence of parentheses (left and right should match).
If you consider, that expressions inside parentheses are ordered. I mean that a and b and c is equivalent to (a and b) and c. You get then Binary expression tree -Wiki
But for your particular case, it is probably: Disjunctive normal form -Wiki
I am not sure, but my intuition says that it is regular expression again because you have only 2 levels of nesting (1st - for 'or-ed' and 2nd - for 'and-ed' parts)
The trees are also a subset of DAWGS - directed acyclic word graphs and one could construct them the same way.
In my case, I have a very small set that I have built by hand and I don't worry about getting the minimal set, but instead just want something that I can easily write down but deals with the types of simple variations I see. Basically, I have different ways of finding where I keep my .el files based upon the different directory structures of various OSes I use. (E.g. when I was working at Google, the /usr/local/emacs/site-lisp directory was actually more like /usr/local/Google/emacs/site-lisp.)
I don't need a full regex, but there are about a dozen variations, some having quite long lists of nested sub-directories (c:\users\cfclark\appData\roaming\emacs.emacs.d or some other awful thing) that I wanted to write down (and then have emacs make an automated search to find the one that is appropriate to this particular installation). And every time I go to a new job, I can simply add to the list a description of where they are in that setup.
Anyway, as that code has evolved, I found that I had I was doing (nested or's and and's and realized that the structure generalized to the alternating or/and/or/and/... case). So, my assumption is that someone must have discovered this before. I had hints of it myself several years ago, but didn't set down to implement it. The Disjunctive Normal Form link mpasko256 gave is also particularly relevant. I don't normalize to that level, I still keep nested and's and or's rather than flattening to 2, but I do have a distinct structure, or's at the top, then and's, then or's....

What would Clojure lose by switching away from leading parenthesis like Dylan, Julia and Seph?

Three lispy homoiconic languages, Dylan, Julia and Seph all moved away from leading parenthesis - so a hypothetical function call in Common Lisp that would look like:
(print hello world)
Would look like the following hypothetical function call
print(hello world)
in the three languages mentioned above.
Were Clojure to go down this path - what would it have to sacrifice to get there?
Reasoning:
Apart from the amazing lazy functional data structures in Clojure, and the improved syntax for maps and seqs, the language support for concurrency, the JVM platform, the tooling and the awesome community - the distinctive thing about it being 'a LISP' is leading parenthesis giving homoiconicity which gives macros providing syntax abstraction.
But if you don't need leading parentheses - why have them? The only arguments I can think of for keeping them are
(1) reusing tool support in emacs
(2) prompting people to 'think in LISP' and not try and treat it as another procedural language)
(Credit to andrew cooke's answer, who provided the link to Wheeler's and Gloria's "Readable Lisp S-expressions Project")
The link above is a project intended to provide a readable syntax for all languages based on s-expressions, including Scheme and Clojure. The conclusion is that it can be done: there's a way to have readable Lisp without the parentheses.
Basically what David Wheeler's project does is add syntactic sugar to Lisp-like languages to provide more modern syntax, in a way that doesn't break Lisp's support for domain-specific languages. The enhancements are optional and backwards-compatible, so you can include as much or as little of it as you want and mix them with existing code.
This project defines three new expression types:
Curly-infix-expressions. (+ 1 2 3) becomes {1 + 2 + 3} at every place you want to use infix operators of any arity. (There is a special case that needs to be handled with care if the inline expression uses several operators, like {1 + 2 * 3} - although {1 + {2 * 3} } works as expected).
Neoteric-expressions. (f x y) becomes f(x y) (requires that no space is placed between the function name and its parameters)
Sweet-expressions. Opening and closing parens can be replaced with (optional) python-like semantic indentation. Sweet-expressions can be freely mixed with traditional parentheses s-expressions.
The result is Lisp-compliant but much more readable code. An example of how the new syntactic sugar enhances readability:
(define (gcd_ a b)
(let (r (% b a))
(if (= r 0) a (gcd_ r a))))
(define-macro (my-gcd)
(apply gcd_ (args) 2))
becomes:
define gcd_(a b)
let r {b % a}
if {r = 0} a gcd_(r a)
define-macro my-gcd()
apply gcd_ (args) 2
Note how the syntax is compatible with macros, which was a problem with previous projects that intended to improve Lisp syntax (as described by Wheeler and Gloria). Because it's just sugar, the final form of each new expression is a s-expression, transformed by the language reader before macros are processed - so macros don't need any special treatment. Thus the "readable Lisp" project preserves homoiconicity, the property that allows Lisp to represent code as data within the language, which is what allows it to be a powerful meta-programming environment.
Just moving the parentheses one atom in for function calls wouldn't be enough to satisfy anybody; people will be complaining about lack of infix operators, begin/end blocks etc. Plus you'd probably have to introduce commas / delimiters in all sorts of places.
Give them that and macros will be much harder just to write correctly (and it would probably be even harder to write macros that look and act nicely with all the new syntax you've introduced by then). And macros are not something that's a nice feature you can ignore or make a whole lot more annoying; the whole language (like any other Lisp) is built right on top of them. Most of the "user-visible" stuff that's in clojure.core, including let, def, defn etc are macros.
Writing macros would become much more difficult because the structure would no longer be simple you would need another way to encode where expressions start and stop using some syntactic symbol to mark the start and end of expressions to you can write code that generates expressions perhaps you could solve this problem by adding something like a ( to mark the start of the expression...
On a completely different angle, it is well worth watching this video on the difference between familiar and easy making lisps syntax more familiar wont make it any easier for people to learn and may make it misleading if it looks to much like something it is not.
even If you completely disagree, that video is well worth the hour.
you wouldn't need to sacrifice anything. there's a very carefully thought-out approach by david wheeler that's completely transparent and backwards compatible, with full support for macros etc.
You would have Mathematica. Mathematica accepts function calls as f[x, y, z], but when you investigate it further, you find that f[x, y, z] is actually syntactic sugar for a list in which the Head (element 0) is f and the Rest (elements 1 through N) is {x, y, z}. You can construct function calls from lists that look a lot like S-expressions and you can decompose unevaluated functions into lists (Hold prevents evaluation, much like quote in Lisp).
There may be semantic differences between Mathematica and Lisp/Scheme/Clojure, but Mathematica's syntax is a demonstration that you can move the left parenthesis over by one atom and still interpret it sensibly, build code with macros, etc.
Syntax is pretty easy to convert with a preprocessor (much easier than semantics). You could probably get the syntax you want through some clever subclassing of Clojure's LispReader. There's even a project that seeks to solve Clojure's off-putting parentheses by replacing them all with square brackets. (It boggles my mind that this is considered a big deal.)
I used to code C/C#/Java/Pascal so I emphasize with the feeling that Lisp code is a bit alien. However that feeling only lasts a few weeks - after a fairly short amount of time the Lisp style will feel very natural and you'll start berating other languages for their "irregular" syntax :-)
There is a very good reason for Lisp syntax. Leading parentheses make code logically simpler to parse and read, by collecting both a function and the expressions that make up it's arguments in a single form.
And when you manipulate code / use macros, it is these forms that matter: these are the building blocks of all your code. So it fundamentally makes sense to put the parentheses in a place that exactly delimits these forms, rather than arbitrarily leaving the first element outside the form.
The "code is data" philosophy is what makes reliable metaprogramming possible. Compare with Perl / Ruby, which have complex syntax, their approaches to metaprogramming are only reliable in very confined circumstances. Lisp metaprogramming is so perfectly reliable that the core of the language depends on it. The reason for this is the uniform syntax shared by code and data (the property of homoiconicity). S-expressions are the way this uniformity is realized.
That said, there are circumstances in Clojure where implied parenthesis is possible:
For example the following three expressions are equivalent:
(first (rest (rest [1 2 3 4 5 6 7 8 9])))
(-> [1 2 3 4 5 6 7 8 9] (rest) (rest) (first))
(-> [1 2 3 4 5 6 7 8 9] rest rest first)
Notice that in the third the -> macro is able in this context to infer parentheses and thereby leave them out. There are many special case scenarios like this. In general Clojure's design errs on the side of less parentheses. See for example the controversial decision of leaving out parenthesis in cond. Clojure is very principled and consistent about this choice.
Interestingly enough - there is an alternate Racket Syntax:
#foo{blah blah blah}
reads as
(foo "blah blah blah")
Were Clojure to go down this path - what would it have to sacrifice to get there?
The sacrifice would the feeling/mental-modal of writing code by creating List of List of List.. or more technically "writing the AST directly".
NOTE: There are other things as well that will be scarified as mentioned in other answers.

Data format safety in clojure

Coming from a Java background, I'm quite fond of static type safety and wonder how clojure programmers deal with the problem of data format definitions (perhaps not just types but general invariants, because types are just a special case of that.)
This is similar to an existing question "Type Safety in Clojure", but that one focuses more on the aspect of how to check types at compile time, while I'm more interested in how the problem is pragmatically addressed.
As a practical example I'm considering an editor application which handles a particular document format. Each document consists of elements that come in several different varieties (graphics elements, font elements etc.) There would be editors for the different element types, and also of course functions to transform a document from/to a byte stream in its native on-disk format.
The basic problem I am interested in is that the editors and the read/write functions have to agree on a common data format. In Java, I would model the document's data as an object graph, e.g. with one class representing a document and one class for each element variety. This way, I get a compile-time guarantee about what the structure of my data looks like, and that the field "width" of a graphics element is an integer and not a float. It does not guarantee that width is positive - but using a getter/setter interface would allow the corresponding class to add invariant guarantees like that.
Being able to rely on this makes the code dealing with this data simpler, and format violations can be caught at compile-time or early at runtime (where some code attempts to modify data that would violate invariants).
How can you achieve a similar "data format reliability" in Clojure? As far as I know, there is no way to perform compile-time checking and hiding domain data behind a function interface seems to be discouraged as non-idiomatic (or maybe I misunderstand?), so what do Clojure developers do to feel safe about the format of data handed into their functions? How do you get your code to error out as quickly as possible, and not after the user edited for 20 more minutes and tries to save to disk, when the save function notices that there is a graphics element in the list of fonts due to an editor bug?
Please note that I'm interested in Clojure and learning, but didn't write any actual software with it yet, so it's possible that I'm just confused and the answer is very simple - if so, sorry for wasting your time :).
I don't see anything wrong or unidiomatic about using a validating API to construct and manipulate your data as in the following.
(defn text-box [text height width]
{:pre [(string? text) (integer? height) (integer? width)]}
{:type 'text-box :text text :height height :width width})
(defn colorize [thing color]
{:pre [(valid-color? color)]}
(assoc thing :color color))
... (colorize (text-box "Hi!" 20 30) :green) ...
In addition, references (vars, refs, atoms, agents) can have an associated validator function that can be used to ensure a valid state at all times.
Good question - I also find that moving from a statically typed language to a dynamic one requires a bit more care about type safety. Fortunately TDD techniques help a huge amount here.
I typically write a "validate" function which checks all your assumptions about the data structure. I often do this in Java too for invariant assumptions, but in Clojure it's more important because you need to check thinks like types as well.
You can then use the validate function in several ways:
As a quick check at the REPL: (validate foo)
In unit tests: (is (validate (new-foo-from-template a b c)))
As a run-time check for key functions, e.g. checking that (read-foo some-foo-input-stream) is valid
If you have a complex data structure which is a tree of multiple different component types, you can write a validate function for each component type and have the validate function for the whole document call validate for each sub-component recursively. A nice trick is to use either protocols or multimethods to make the validate function polymorphic for each component type.

Scheme: Detecting duplicate elements in a list

Does R6RS or Chez Scheme v7.9.4 have a library function to check if a list contains duplicate elements?
Alternatively, do either have any built in functionality for sets (which dis-allow duplicate elements)? So far, I've only been able to find an example here.
The problem with that is that it doesn't appear to actually be part of the Chez Scheme library. Although I could write my own version of this, I'd much rather use a well known, tested, and maintained library function - especially given how basic an operation this is.
So a simple "use these built-in functions" or a "no built-in library implements this" will suffice. Thanks!
SRFI 1 on list processing has a delete-duplicates function (so you could use that and check the length afterward) and may well have other functions you might find useful.
Kyle,
Awhile back I needed to use a few SRFIs with Chez Scheme. A few that a converted for use with Chez Scheme (including SRFI-1) are at:
http://github.com/dharmatech/chez-srfi
After you add the path to 'chez-srfi' to your CHEZSCHEMELIBDIRS, you can import SRFI-1:
(import (srfi :1))
Ed