Clojure: Definitions of basic terms - clojure

In Clojure context, some define the term form as “any valid code,” and some as “any valid code that returns a value.” So both the numeral 1729 and the string Hello! is a form. Likewise (fn is not a form. Is an undefined symbol, say my-val, a form?
What is the difference between an expression and a form?
What is the difference between an expression and a function?

There are some good answers to this question at Are Lisp forms and Lisp expressions same thing?
The key thing to think about is that there are different points in the lifecycle. We start with text "(+ 1 2)" which is then read into Clojure data (a list containing a symbol and two numbers). Often in Lisps "expression" is used to mean the former and "form" is used to mean the latter. In practice, I do not find that people are at all consistent with this usage and often use both terms for both things.
If you take "form" to mean "something which can be evaluated", then 1729 or "Hello!" or the symbol my-val are all forms. When my-val is evaluated it is resolved in the current namespace, perhaps to a function instance, which is invokable. Functions are really important only at evaluation time, when they can be invoked.
Another interesting aspect are macros, which allow you to create new syntax. Note that macro expansion happens after reading though, which means that while you can create new syntax, it still must follow some basic expectations that are encoded into the reader (namely that invocations follow the pattern (<invokable> <args...>)). Note that macros work on read but unevaluated forms (data, not text) and must produce new forms.

What is the difference between an expression and a form?
In my opinion form in a context of Clojure is something a compiler deals with. Some forms are valid expressions while others are "special" forms (i.e. macros).
What is the difference between an expression and a function?
Any function is an expression.
Is an undefined symbol, say my-val, a form?
I would say it is a valid expression (hence form) which yields to a compile time exception.
Likewise (fn) is not a form
It seems like you are referring to some source, where this was declared, could you provide a link?

Related

typed vs untyped vs expr vs stmt in templates and macros

I've been lately using templates and macros, but i have to say i have barely found information about these important types. This is my superficial understanding:
typed/expr is something that must exists previously, but you can use .immediate. to overcome them.
untyped/stmt is something that doesn't to be defined previously/one or more statements.
This is a very vague notion of the types. I'd like to have a better explanation of them, including which types should be used as return.
The goal of these different parameter types is to give you several increasing levels of precision in specifying what the compiler should accept as a parameter to the macro.
Let's imagine a hypothetical macro that can solve mathematical equations. It will be used like this:
solve(x + 10 = 25) # figures out that the correct value for x is 15
Here, the macro just cares about the structure of the supplied AST tree. It doesn't require that the same tree is a valid expression in the current scope (i.e. that x is defined and so on). The macro just takes advantage of the Nim parser that already can decode most of the mathematical equations to turn them into easier to handle AST trees. That's what untyped parameters are for. They don't get semantically checked and you get the raw AST.
On the next step in the precision ladder are the typed parameters. They allow us to write a generic kind of macro that will accept any expression, as long as it has a proper meaning in the current scope (i.e. its type can be determined). Besides catching errors earlier, this also has the advantage that we can now work with the type of the expression within the macro body (using the macros.getType proc).
We can get even more precise by requiring an expression of a specific type (either a concrete type or a type class/concept). The macro will now be able to participate in overload resolution like a regular proc. It's important to understand that the macro will still receive an AST tree, as it will accept both expressions that can be evaluated at compile-time and expressions that can only be evaluated at run-time.
Finally, we can require that the macro receives a value of specific type that is supplied at compile-time. The macro can work with this value to parametrise the code generation. This is realm of the static parameters. Within the body of the macro, they are no longer AST trees, but rather ordinary well typed values.
So far, we've only talked about expressions, but Nim's macros also accept and produce blocks and this is the second axis, which we can control. expr generally means a single expression, while stmt denotes a list of expressions (historically, its name comes from StatementList, which existed as a separate concept before expressions and statements were unified in Nim).
The distinction is most easily illustrated with the return types of templates. Consider the newException template from the system module:
template newException*(exceptn: typedesc, message: string): expr =
## creates an exception object of type ``exceptn`` and sets its ``msg`` field
## to `message`. Returns the new exception object.
var
e: ref exceptn
new(e)
e.msg = message
e
Even thought it takes several steps to construct an exception, by specifying expr as the return type of the template, we tell the compiler that only that last expression will be considered as the return value of the template. The rest of the statements will be inlined, but cleverly hidden from the calling code.
As another example, let's define a special assignment operator that can emulate the semantics of C/C++, allowing assignments within if statements:
template `:=` (a: untyped, b: typed): bool =
var a = b
a != nil
if f := open("foo"):
...
Specifying a concrete type has the same semantics as using expr. If we had used the default stmt return type instead, the compiler wouldn't have allowed us to pass a "list of expressions", because the if statement obviously expects a single expression.
.immediate. is a legacy from a long-gone past, when templates and macros didn't participate in overload resolution. When we first made them aware of the type system, plenty of code needed the current untyped parameters, but it was too hard to refactor the compiler to introduce them from the start and instead we added the .immediate. pragma as a way to force the backward-compatible behaviour for the whole macro/template.
With typed/untyped, you have a more granular control over the individual parameters of the macro and the .immediate. pragma will be gradually phased out and deprecated.

What are 'if ,define, lambda' in scheme?

We have a course whose project is to implement a micro-scheme interpreter in C++. In my implementation, I treat 'if', 'define', 'lambda' as procedures, so it is valid in my implementation to eval 'if', 'define' or 'lambda', and it is also fine to write expressions like '(apply define (quote (a 1)))', which will bind 'a' to 1.
But I find in racket and in mit-scheme, 'if', 'define', 'lambda' are not evaluable. For example,
It seems that they are not procedures, but I cannot figure out what they are and how they are implemented.
Can someone explain these to me?
In the terminology of Lisp, expressions to be evaluated are forms. Compound forms (those which use list syntax) are divided into special forms (headed by special operators like let), macro forms, and function call forms.
The Scheme report desn't use this terminology. It calls functions "procedures". Scheme's special forms are called "syntax". Macros are "derived expression types", individually introduced as "library syntax". (The motivation for this may be some conscious decision to blend into the CS academic mainstream by scrubbing some unfamiliar Lisp terminology. Algol has procedures and a BNF-defined syntax, Scheme has procedures and a BNF-defined syntax. That ticks off some sort of familiarity checkbox.)
Special forms (or "syntax") are recognized by interpreters and compilers as a set of special cases. The interpreter or compiler may handle these forms via function-like bindings in some internal table keyed on symbols, but it's not the program-visible binding namespace.
Setting up these associations in the regular namespace isn't necessarily wrong, but it could be problematic. If you want both a compiler and interpreter, but let has only one top-level binding, that will be an issue: who gets to install their procedure into that binding: the interpreter or compiler? (One way to resolve that is simple: make the binding values cons pairs: the car can be the interpreter function, the cdr the compiler function. But then these bindings are not procedures any more that you can apply.)
Exposing these bindings to the application is problematic anyway, because the semantics is so different between interpretation and compilation. If your interpretation is interpreted, then calling the define binding as a function is possible; it has the effect of performing the definition. But in a compiled interpretation, code depending on this won't work; define will be a function that doesn't actually define anything, but rather compiles: it calculates and returns a compiled fragment written in some intermediate representation.
About your implementation, the fact that (apply define (quote (a 1))) works in your implementation raises a bit of a red flag. Either you've made the environment parameter of the function optional, or it doesn't take one. Functions implementing special operators (or "syntax") need an environment parameter, not just the piece of syntax. (At least if we are developing a lexically scoped Scheme or Lisp!)
The fact that (apply define (quote (a 1))) works also suggests that your define function is taking quote and (a 1) as arguments. While that is workable, the usual approach for these kinds of syntax procedures is to take the whole form as one argument (and a lexical environment as another argument). If such a function can be called, the invocation looks something like like (apply define (list '(define a 1) (null-environment 5))). The procedure itself will do any necessary destructuring on the syntax, and checking for validity: are there too many or too few parameters and so on.

In what languages can you redefine methods/functions in terms of themselves?

I'm interested in trying literate programming. However, I often that the requirements are stated in a general but then exceptions are given much later.
For example in one section it will say something like Students are not allowed in the hallways while classes are in session.
But then later there will be section where it says something like Teachers may give a student a hall pass at which point the student may be in the hall while class is in session.
So I'd like to be able to define allowedInTheHall following the first section so that it doesn't allow students in the hall, but then after the second section redefines allowedInTheHall so that it first checks for the presence of a hall pass, and if it's missing then delegates back to the previous definition.
So the only way I can imagine this working would be a language where:
you can redefine a method/function/subroutine in terms of it's previous definition
where only the latest version of a function gets called even if the caller was defined before the latest redefinition of the callee (I believe this is called "late binding").
So which languages fulfill support these criteria?
PS- my motivation is that I am working with existing requirements (in my case game rules) and I want to embed my code into the existing rules so that the code follows the structure of the rules that people are already familiar with. I assume that this situation would also arise trying to implement a legal contract.
Well to answer the direct question,
you can redefine a method/function/subroutine in terms of it's previous definition
...in basically any language, as long as it supports two features:
mutable variables that can hold function values
some kind of closure forming operator, which effectively amounts to the ability to create new function values
So you can't do it in C, because even though it allows variables to store function pointers, there's no operation in C that can compute a new function value; and you can't do it in Haskell because Haskell doesn't let you mutate a variable once it's been defined. But you can do it in e.g. JavaScript:
var f1 = function(x) {
console.log("first version got " + x);
};
function around(f, before, after) {
return function() {
before(); f.apply(null, arguments); after();
};
}
f1 = around(f1,
function(){console.log("added before");},
function(){console.log("added after");});
f1(12);
or Scheme:
(define (f1 x) (display "first version got ") (display x) (newline))
(define (around f before after)
(lambda x
(before) (apply f x) (after) ))
(set! f1 (around
f1
(lambda () (display "added before") (newline))
(lambda () (display "added after") (newline))))
(f1 12)
...or a whole host of other languages, because those are really rather common features. The operation (which I think is generally called "advice") is basically analogous to the ubiquitous x = x + 1, except the value is a function and the "addition" is the wrapping of extra operations around it to create a new functional value.
The reason this works is that by passing the old function in as a parameter (to around, or just a let or whatever) the new function is closing over it referred to through a locally scoped name; if the new function referred to the global name, the old value would be lost and the new function would just recurse.
Technically you could say this is a form of late binding - the function is being retrieved from a variable rather than being linked in directly - but generally the term is used to refer to much more dynamic behaviour, such as as JS field access where the field might not even actually exist. In the above case the compiler can at least be sure the variable f1 will exist, even if it turns out to hold null or something, so lookup is fast.
Other functions that call f1 would work the way you expect assuming that they reference it by that name. If you did var f3 = f1; before the around call, functions defined as calling f3 wouldn't be affected; similarly objects that got a hold of f1 by having it passed in as a parameter or something. Basic lexical scoping rule applies. If you want such functions to be affected too, you could pull it off using something like PicoLisp... but you're also doing something you probably shouldn't (and that's not any kind of binding any more: that's direct mutation of a function object).
All that aside, I'm not sure this is in the spirit of literate programming at all - or for that matter, a program that describes rules. Are rules supposed to change depending on how far you are through the book or in what order you read the chapters? Literate programs aren't - just as a paragraph of text usually means one thing (you may not understand it, but its meaning is fixed) no matter whether you read it first or last, so should a declaration in a true literate program, right? One doesn't normally read a reference - such as a book of rules - from cover to cover like a novel.
Whereas designed like this, the meaning of the program is highly dependent on being read with the statements in one specific order. It's very much a machine-friendly series-of-instructions... not so much a reference book.

What's the convention for using an asterisk at the end of a function name in Clojure and other Lisp dialects?

Note that I'm not talking about ear muffs in symbol names, an issue that is discussed at Conventions, Style, and Usage for Clojure Constants? and How is the `*var-name*` naming-convention used in clojure?. I'm talking strictly about instances where there is some function named foo that then calls a function foo*.
In Clojure it basically means "foo* is like foo, but somehow different, and you probably want foo". In other words, it means that the author of that code couldn't come up with a better name for the second function, so they just slapped a star on it.
Mathematicians and Haskellers can use their apostrophes to indicate similar objects (values or functions). Similar but not quite the same. Objects that relate to each other. For instance, function foo could be a calculation in one manner, and foo' would do the same result but with a different approach. Perhaps it is unimaginative naming but it has roots in mathematics.
Lisps generally (without any terminal reason) have discarded apostrophes in symbol names, and * kind of resembles an apostrophe. Clojure 1.3 will finally fix that by allowing apostrophes in names!
If I understand your question correctly, I've seen instances where foo* was used to show that the function is equivalent to another in theory, but uses different semantics. Take for instance the lamina library, which defines things like map*, filter*, take* for its core type, channels. Channels are similar enough to seqs that the names of these functions make sense, but they are not compatible enough that they should be "equal" per se.
Another use case I've seen for foo* style is for functions which call out to a helper function with an extra parameter. The fact function, for instance, might delegate to fact* which accepts another parameter, the accumulator, if written recursively. You don't necessarily want to expose in fact that there's an extra argument, because calling (fact 5 100) isn't going to compute for you the factorial of 5--exposing that extra parameter is an error.
I've also seen the same style for macros. The macro foo expands into a function call to foo*.
a normal let binding (let ((...))) create separate variables in parallel
a let star binding (let* ((...))) creates variables sequentially so that can be computed from eachother like so
(let* ((x 10) (y (+ x 5)))
I could be slightly off base but see LET versus LET* in Common Lisp for more detail
EDIT: I'm not sure about how this reflects in Clojure, I've only started reading Programming Clojure so I don't know yet

Clojure static typing

I know that this may sound like blasphemy to Lisp aficionados (and other lovers of dynamic languages), but how difficult would it be to enhance the Clojure compiler to support static (compile-time) type checking?
Setting aside the arguments for and against static and dynamic typing, is this possible (not "is this advisable")?
I was thinking that adding a new reader macro to force a compile-time type (an enhanced version of the #^ macro) and adding the type information to the symbol table would allow the compiler to flag places where a variables was misused. For example, in the following code, I would expect a compile-time error (#* is the "compile-time" type macro):
(defn get-length [#*String s] (.length s))
(defn test-get-length [] (get-length 2.0))
The #^ macro could even be reused with a global variable (*compile-time-type-checking*) to force the compiler the do the checks.
Any thoughts on the feasibility?
It certain possible. However I do not think that Clojure will ever get any form of weak static typing - it's benefits are too few.
Rich Hickey has however expressed on several occasions his like for the strong, optional, and expressive typing feature of the Qi language, http://www.lambdassociates.org/qilisp.htm
It's certainly possible. The compiler already does some static type checking around primitive argument types in the 1.3 development branch.
Yes! It looks like there is a project underway, core.typed, to make optional static type checking a reality. See the Github project and its
documentation
This work grew out of an undergraduate honours dissertation (PDF) by Ambrose Bonnaire-Sergeant, and is related to the Typed Racket system.
Since one form is read AND evaluated at a time you cannot have forward references making this somewhat limited.
Old question but two important points: I don't think Clojure supports reader macros, only ordinary lisp macros. And now we have core.typed option for typing in Clojure.
declare can have type hints, so it is possible to declare a var that "is" the type which has not been defined yet but contains data about the structure, but this would be really clunky and you would have to do it before any code path that could be executed before the type is defined. Basically, you would want to define all of your user defined types up front and then use them like normal. I think that makes library writing somewhat hackish.
I didn't mean to suggest earlier that this isn't possible, just that for user defined types it is a lot more complicated than for pre-defined types. The benefit of doing this vs. the cost is something that should be seriously considered. But I encourage anyone who is interested to try it out and see if they can make it work!