What are 'if ,define, lambda' in scheme? - c++

We have a course whose project is to implement a micro-scheme interpreter in C++. In my implementation, I treat 'if', 'define', 'lambda' as procedures, so it is valid in my implementation to eval 'if', 'define' or 'lambda', and it is also fine to write expressions like '(apply define (quote (a 1)))', which will bind 'a' to 1.
But I find in racket and in mit-scheme, 'if', 'define', 'lambda' are not evaluable. For example,
It seems that they are not procedures, but I cannot figure out what they are and how they are implemented.
Can someone explain these to me?

In the terminology of Lisp, expressions to be evaluated are forms. Compound forms (those which use list syntax) are divided into special forms (headed by special operators like let), macro forms, and function call forms.
The Scheme report desn't use this terminology. It calls functions "procedures". Scheme's special forms are called "syntax". Macros are "derived expression types", individually introduced as "library syntax". (The motivation for this may be some conscious decision to blend into the CS academic mainstream by scrubbing some unfamiliar Lisp terminology. Algol has procedures and a BNF-defined syntax, Scheme has procedures and a BNF-defined syntax. That ticks off some sort of familiarity checkbox.)
Special forms (or "syntax") are recognized by interpreters and compilers as a set of special cases. The interpreter or compiler may handle these forms via function-like bindings in some internal table keyed on symbols, but it's not the program-visible binding namespace.
Setting up these associations in the regular namespace isn't necessarily wrong, but it could be problematic. If you want both a compiler and interpreter, but let has only one top-level binding, that will be an issue: who gets to install their procedure into that binding: the interpreter or compiler? (One way to resolve that is simple: make the binding values cons pairs: the car can be the interpreter function, the cdr the compiler function. But then these bindings are not procedures any more that you can apply.)
Exposing these bindings to the application is problematic anyway, because the semantics is so different between interpretation and compilation. If your interpretation is interpreted, then calling the define binding as a function is possible; it has the effect of performing the definition. But in a compiled interpretation, code depending on this won't work; define will be a function that doesn't actually define anything, but rather compiles: it calculates and returns a compiled fragment written in some intermediate representation.
About your implementation, the fact that (apply define (quote (a 1))) works in your implementation raises a bit of a red flag. Either you've made the environment parameter of the function optional, or it doesn't take one. Functions implementing special operators (or "syntax") need an environment parameter, not just the piece of syntax. (At least if we are developing a lexically scoped Scheme or Lisp!)
The fact that (apply define (quote (a 1))) works also suggests that your define function is taking quote and (a 1) as arguments. While that is workable, the usual approach for these kinds of syntax procedures is to take the whole form as one argument (and a lexical environment as another argument). If such a function can be called, the invocation looks something like like (apply define (list '(define a 1) (null-environment 5))). The procedure itself will do any necessary destructuring on the syntax, and checking for validity: are there too many or too few parameters and so on.

Related

Clojure: Definitions of basic terms

In Clojure context, some define the term form as “any valid code,” and some as “any valid code that returns a value.” So both the numeral 1729 and the string Hello! is a form. Likewise (fn is not a form. Is an undefined symbol, say my-val, a form?
What is the difference between an expression and a form?
What is the difference between an expression and a function?
There are some good answers to this question at Are Lisp forms and Lisp expressions same thing?
The key thing to think about is that there are different points in the lifecycle. We start with text "(+ 1 2)" which is then read into Clojure data (a list containing a symbol and two numbers). Often in Lisps "expression" is used to mean the former and "form" is used to mean the latter. In practice, I do not find that people are at all consistent with this usage and often use both terms for both things.
If you take "form" to mean "something which can be evaluated", then 1729 or "Hello!" or the symbol my-val are all forms. When my-val is evaluated it is resolved in the current namespace, perhaps to a function instance, which is invokable. Functions are really important only at evaluation time, when they can be invoked.
Another interesting aspect are macros, which allow you to create new syntax. Note that macro expansion happens after reading though, which means that while you can create new syntax, it still must follow some basic expectations that are encoded into the reader (namely that invocations follow the pattern (<invokable> <args...>)). Note that macros work on read but unevaluated forms (data, not text) and must produce new forms.
What is the difference between an expression and a form?
In my opinion form in a context of Clojure is something a compiler deals with. Some forms are valid expressions while others are "special" forms (i.e. macros).
What is the difference between an expression and a function?
Any function is an expression.
Is an undefined symbol, say my-val, a form?
I would say it is a valid expression (hence form) which yields to a compile time exception.
Likewise (fn) is not a form
It seems like you are referring to some source, where this was declared, could you provide a link?

How does Clojure's optimiser work, and where is it?

I am new to Clojure, but not to lisp. A few of the design decisions look strange to me - specifically requiring a vector for function parameters and explicitly requesting tail calls using recur.
Translating lists to vectors (and vice versa) is a standard operation for an optimiser. Tail calls can be converted to iteration by rewriting to equivalent clojure before compiling to byte code. The [] and recur syntax suggest that neither of these optimisations are present in the current implementation.
I would like a pointer to where in the implementation I can find any/all source-to-source transformation passes. I don't speak Java very well so am struggling to navigate the codebase.
If there isn't any optimisation before function-by-function translation to the JVM's byte code, I'd be interested in the design rationale for this. Perhaps to achieve faster compilation?
Thank you.
There is no explicit optimizer package in the compiler code. Any optimizations are done "inline". Some can be enabled or disabled via compiler flags.
Observe that literal vectors for function parameters are a syntactic choice how functions are represented in source code. Whether they are represented as vectors or list or anything else would not affect runtime and cannot be optimized hence.
Regarding automatic recur, Rich Hickey explained his decision here:
When speaking about general TCO, we are not just talking about
recursive self-calls, but also tail calls to other functions. Full TCO
in the latter case is not possible on the JVM at present whilst
preserving Java calling conventions (i.e without interpreting or
inserting a trampoline etc).
While making self tail-calls into jumps would be easy (after all,
that's what recur does), doing so implicitly would create the wrong
expectations for those coming from, e.g. Scheme, which has full TCO.
So, instead we have an explicit recur construct.
Essentially it boils down to the difference between a mere
optimization and a semantic promise. Until I can make it a promise,
I'd rather not have partial TCO.
Some people even prefer 'recur' to the redundant restatement of the
function name. In addition, recur can enforce tail-call position.
specifically requiring a vector for function parameters
Most other lisps build structures out of syntactic lists. For an associative "map" for example, you build a list of lists. For a "vector", you make a list. For a conditional switch-like expression, you make a list of lists of lists. Lots of lists, lots of parenthesis.
Clojure has made it an obvious goal to make the syntax of lisp more readable and less redundant. A map, set, list, vector all have their own syntax delimiters so they jump out at the eye, while also providing specific functionality that otherwise you'd have to explicitly request using a function if they were all lists. In addition to these structural primitives, other functions like cond minimize the parentheses by removing one layer of parentheses for each pair in the expression, rather than additionally wrapping each pair in yet another grouped parenthesis. This philosophy is widespread throughout the language and its core library so the code is more readable and elegant.
Function parameters as a vector are just part of this syntax. It's not about whether the language can convert a list to a vector easily, it's about how the language requires the placement of function parameters in a function definition -- and it does so by explicitly requiring a vector. And in fact, you can see this clearly in the source for defn:
https://github.com/clojure/clojure/blob/clojure-1.7.0/src/clj/clojure/core.clj#L296
It's just a requirement for how a function is written, that's all.

Clojure methods ending in *

What do methods ending in * tend to have in common? I've seen a few, but have no idea if this is an established naming convention.
In general I've seen this used to distinguish functions that do the same thing but with different signatures, especially in situations where overloads would create conflicting semantics. For example, list* could not be expressed as an overload of list because they are using variable arity in different ways.
In many cases (but not all), the * form is called by the non-* version.
Apart from what other answers have mentioned, This convention is used where the non-* version are macros and these macros emit code that calls the * functions. Even in clojure.core, let and fn are macros whose resulting code calls let* and fn* functions respectively. Other example would be sqlkorma, where non-* (where,delete,update etc) are macros and * ones (where*, delete* etc) are functions.
The reason for using this pattern is that in some cases it is not feasible to use the macro version of the API (apart from using eval, as you don't have the information at compile time), in such cases you can uses the * based functions.

What's the convention for using an asterisk at the end of a function name in Clojure and other Lisp dialects?

Note that I'm not talking about ear muffs in symbol names, an issue that is discussed at Conventions, Style, and Usage for Clojure Constants? and How is the `*var-name*` naming-convention used in clojure?. I'm talking strictly about instances where there is some function named foo that then calls a function foo*.
In Clojure it basically means "foo* is like foo, but somehow different, and you probably want foo". In other words, it means that the author of that code couldn't come up with a better name for the second function, so they just slapped a star on it.
Mathematicians and Haskellers can use their apostrophes to indicate similar objects (values or functions). Similar but not quite the same. Objects that relate to each other. For instance, function foo could be a calculation in one manner, and foo' would do the same result but with a different approach. Perhaps it is unimaginative naming but it has roots in mathematics.
Lisps generally (without any terminal reason) have discarded apostrophes in symbol names, and * kind of resembles an apostrophe. Clojure 1.3 will finally fix that by allowing apostrophes in names!
If I understand your question correctly, I've seen instances where foo* was used to show that the function is equivalent to another in theory, but uses different semantics. Take for instance the lamina library, which defines things like map*, filter*, take* for its core type, channels. Channels are similar enough to seqs that the names of these functions make sense, but they are not compatible enough that they should be "equal" per se.
Another use case I've seen for foo* style is for functions which call out to a helper function with an extra parameter. The fact function, for instance, might delegate to fact* which accepts another parameter, the accumulator, if written recursively. You don't necessarily want to expose in fact that there's an extra argument, because calling (fact 5 100) isn't going to compute for you the factorial of 5--exposing that extra parameter is an error.
I've also seen the same style for macros. The macro foo expands into a function call to foo*.
a normal let binding (let ((...))) create separate variables in parallel
a let star binding (let* ((...))) creates variables sequentially so that can be computed from eachother like so
(let* ((x 10) (y (+ x 5)))
I could be slightly off base but see LET versus LET* in Common Lisp for more detail
EDIT: I'm not sure about how this reflects in Clojure, I've only started reading Programming Clojure so I don't know yet

Clojure static typing

I know that this may sound like blasphemy to Lisp aficionados (and other lovers of dynamic languages), but how difficult would it be to enhance the Clojure compiler to support static (compile-time) type checking?
Setting aside the arguments for and against static and dynamic typing, is this possible (not "is this advisable")?
I was thinking that adding a new reader macro to force a compile-time type (an enhanced version of the #^ macro) and adding the type information to the symbol table would allow the compiler to flag places where a variables was misused. For example, in the following code, I would expect a compile-time error (#* is the "compile-time" type macro):
(defn get-length [#*String s] (.length s))
(defn test-get-length [] (get-length 2.0))
The #^ macro could even be reused with a global variable (*compile-time-type-checking*) to force the compiler the do the checks.
Any thoughts on the feasibility?
It certain possible. However I do not think that Clojure will ever get any form of weak static typing - it's benefits are too few.
Rich Hickey has however expressed on several occasions his like for the strong, optional, and expressive typing feature of the Qi language, http://www.lambdassociates.org/qilisp.htm
It's certainly possible. The compiler already does some static type checking around primitive argument types in the 1.3 development branch.
Yes! It looks like there is a project underway, core.typed, to make optional static type checking a reality. See the Github project and its
documentation
This work grew out of an undergraduate honours dissertation (PDF) by Ambrose Bonnaire-Sergeant, and is related to the Typed Racket system.
Since one form is read AND evaluated at a time you cannot have forward references making this somewhat limited.
Old question but two important points: I don't think Clojure supports reader macros, only ordinary lisp macros. And now we have core.typed option for typing in Clojure.
declare can have type hints, so it is possible to declare a var that "is" the type which has not been defined yet but contains data about the structure, but this would be really clunky and you would have to do it before any code path that could be executed before the type is defined. Basically, you would want to define all of your user defined types up front and then use them like normal. I think that makes library writing somewhat hackish.
I didn't mean to suggest earlier that this isn't possible, just that for user defined types it is a lot more complicated than for pre-defined types. The benefit of doing this vs. the cost is something that should be seriously considered. But I encourage anyone who is interested to try it out and see if they can make it work!