What's the convention for using an asterisk at the end of a function name in Clojure and other Lisp dialects? - clojure

Note that I'm not talking about ear muffs in symbol names, an issue that is discussed at Conventions, Style, and Usage for Clojure Constants? and How is the `*var-name*` naming-convention used in clojure?. I'm talking strictly about instances where there is some function named foo that then calls a function foo*.

In Clojure it basically means "foo* is like foo, but somehow different, and you probably want foo". In other words, it means that the author of that code couldn't come up with a better name for the second function, so they just slapped a star on it.

Mathematicians and Haskellers can use their apostrophes to indicate similar objects (values or functions). Similar but not quite the same. Objects that relate to each other. For instance, function foo could be a calculation in one manner, and foo' would do the same result but with a different approach. Perhaps it is unimaginative naming but it has roots in mathematics.
Lisps generally (without any terminal reason) have discarded apostrophes in symbol names, and * kind of resembles an apostrophe. Clojure 1.3 will finally fix that by allowing apostrophes in names!

If I understand your question correctly, I've seen instances where foo* was used to show that the function is equivalent to another in theory, but uses different semantics. Take for instance the lamina library, which defines things like map*, filter*, take* for its core type, channels. Channels are similar enough to seqs that the names of these functions make sense, but they are not compatible enough that they should be "equal" per se.
Another use case I've seen for foo* style is for functions which call out to a helper function with an extra parameter. The fact function, for instance, might delegate to fact* which accepts another parameter, the accumulator, if written recursively. You don't necessarily want to expose in fact that there's an extra argument, because calling (fact 5 100) isn't going to compute for you the factorial of 5--exposing that extra parameter is an error.
I've also seen the same style for macros. The macro foo expands into a function call to foo*.

a normal let binding (let ((...))) create separate variables in parallel
a let star binding (let* ((...))) creates variables sequentially so that can be computed from eachother like so
(let* ((x 10) (y (+ x 5)))
I could be slightly off base but see LET versus LET* in Common Lisp for more detail
EDIT: I'm not sure about how this reflects in Clojure, I've only started reading Programming Clojure so I don't know yet

Related

How does Clojure's optimiser work, and where is it?

I am new to Clojure, but not to lisp. A few of the design decisions look strange to me - specifically requiring a vector for function parameters and explicitly requesting tail calls using recur.
Translating lists to vectors (and vice versa) is a standard operation for an optimiser. Tail calls can be converted to iteration by rewriting to equivalent clojure before compiling to byte code. The [] and recur syntax suggest that neither of these optimisations are present in the current implementation.
I would like a pointer to where in the implementation I can find any/all source-to-source transformation passes. I don't speak Java very well so am struggling to navigate the codebase.
If there isn't any optimisation before function-by-function translation to the JVM's byte code, I'd be interested in the design rationale for this. Perhaps to achieve faster compilation?
Thank you.
There is no explicit optimizer package in the compiler code. Any optimizations are done "inline". Some can be enabled or disabled via compiler flags.
Observe that literal vectors for function parameters are a syntactic choice how functions are represented in source code. Whether they are represented as vectors or list or anything else would not affect runtime and cannot be optimized hence.
Regarding automatic recur, Rich Hickey explained his decision here:
When speaking about general TCO, we are not just talking about
recursive self-calls, but also tail calls to other functions. Full TCO
in the latter case is not possible on the JVM at present whilst
preserving Java calling conventions (i.e without interpreting or
inserting a trampoline etc).
While making self tail-calls into jumps would be easy (after all,
that's what recur does), doing so implicitly would create the wrong
expectations for those coming from, e.g. Scheme, which has full TCO.
So, instead we have an explicit recur construct.
Essentially it boils down to the difference between a mere
optimization and a semantic promise. Until I can make it a promise,
I'd rather not have partial TCO.
Some people even prefer 'recur' to the redundant restatement of the
function name. In addition, recur can enforce tail-call position.
specifically requiring a vector for function parameters
Most other lisps build structures out of syntactic lists. For an associative "map" for example, you build a list of lists. For a "vector", you make a list. For a conditional switch-like expression, you make a list of lists of lists. Lots of lists, lots of parenthesis.
Clojure has made it an obvious goal to make the syntax of lisp more readable and less redundant. A map, set, list, vector all have their own syntax delimiters so they jump out at the eye, while also providing specific functionality that otherwise you'd have to explicitly request using a function if they were all lists. In addition to these structural primitives, other functions like cond minimize the parentheses by removing one layer of parentheses for each pair in the expression, rather than additionally wrapping each pair in yet another grouped parenthesis. This philosophy is widespread throughout the language and its core library so the code is more readable and elegant.
Function parameters as a vector are just part of this syntax. It's not about whether the language can convert a list to a vector easily, it's about how the language requires the placement of function parameters in a function definition -- and it does so by explicitly requiring a vector. And in fact, you can see this clearly in the source for defn:
https://github.com/clojure/clojure/blob/clojure-1.7.0/src/clj/clojure/core.clj#L296
It's just a requirement for how a function is written, that's all.

What are 'if ,define, lambda' in scheme?

We have a course whose project is to implement a micro-scheme interpreter in C++. In my implementation, I treat 'if', 'define', 'lambda' as procedures, so it is valid in my implementation to eval 'if', 'define' or 'lambda', and it is also fine to write expressions like '(apply define (quote (a 1)))', which will bind 'a' to 1.
But I find in racket and in mit-scheme, 'if', 'define', 'lambda' are not evaluable. For example,
It seems that they are not procedures, but I cannot figure out what they are and how they are implemented.
Can someone explain these to me?
In the terminology of Lisp, expressions to be evaluated are forms. Compound forms (those which use list syntax) are divided into special forms (headed by special operators like let), macro forms, and function call forms.
The Scheme report desn't use this terminology. It calls functions "procedures". Scheme's special forms are called "syntax". Macros are "derived expression types", individually introduced as "library syntax". (The motivation for this may be some conscious decision to blend into the CS academic mainstream by scrubbing some unfamiliar Lisp terminology. Algol has procedures and a BNF-defined syntax, Scheme has procedures and a BNF-defined syntax. That ticks off some sort of familiarity checkbox.)
Special forms (or "syntax") are recognized by interpreters and compilers as a set of special cases. The interpreter or compiler may handle these forms via function-like bindings in some internal table keyed on symbols, but it's not the program-visible binding namespace.
Setting up these associations in the regular namespace isn't necessarily wrong, but it could be problematic. If you want both a compiler and interpreter, but let has only one top-level binding, that will be an issue: who gets to install their procedure into that binding: the interpreter or compiler? (One way to resolve that is simple: make the binding values cons pairs: the car can be the interpreter function, the cdr the compiler function. But then these bindings are not procedures any more that you can apply.)
Exposing these bindings to the application is problematic anyway, because the semantics is so different between interpretation and compilation. If your interpretation is interpreted, then calling the define binding as a function is possible; it has the effect of performing the definition. But in a compiled interpretation, code depending on this won't work; define will be a function that doesn't actually define anything, but rather compiles: it calculates and returns a compiled fragment written in some intermediate representation.
About your implementation, the fact that (apply define (quote (a 1))) works in your implementation raises a bit of a red flag. Either you've made the environment parameter of the function optional, or it doesn't take one. Functions implementing special operators (or "syntax") need an environment parameter, not just the piece of syntax. (At least if we are developing a lexically scoped Scheme or Lisp!)
The fact that (apply define (quote (a 1))) works also suggests that your define function is taking quote and (a 1) as arguments. While that is workable, the usual approach for these kinds of syntax procedures is to take the whole form as one argument (and a lexical environment as another argument). If such a function can be called, the invocation looks something like like (apply define (list '(define a 1) (null-environment 5))). The procedure itself will do any necessary destructuring on the syntax, and checking for validity: are there too many or too few parameters and so on.

In what languages can you redefine methods/functions in terms of themselves?

I'm interested in trying literate programming. However, I often that the requirements are stated in a general but then exceptions are given much later.
For example in one section it will say something like Students are not allowed in the hallways while classes are in session.
But then later there will be section where it says something like Teachers may give a student a hall pass at which point the student may be in the hall while class is in session.
So I'd like to be able to define allowedInTheHall following the first section so that it doesn't allow students in the hall, but then after the second section redefines allowedInTheHall so that it first checks for the presence of a hall pass, and if it's missing then delegates back to the previous definition.
So the only way I can imagine this working would be a language where:
you can redefine a method/function/subroutine in terms of it's previous definition
where only the latest version of a function gets called even if the caller was defined before the latest redefinition of the callee (I believe this is called "late binding").
So which languages fulfill support these criteria?
PS- my motivation is that I am working with existing requirements (in my case game rules) and I want to embed my code into the existing rules so that the code follows the structure of the rules that people are already familiar with. I assume that this situation would also arise trying to implement a legal contract.
Well to answer the direct question,
you can redefine a method/function/subroutine in terms of it's previous definition
...in basically any language, as long as it supports two features:
mutable variables that can hold function values
some kind of closure forming operator, which effectively amounts to the ability to create new function values
So you can't do it in C, because even though it allows variables to store function pointers, there's no operation in C that can compute a new function value; and you can't do it in Haskell because Haskell doesn't let you mutate a variable once it's been defined. But you can do it in e.g. JavaScript:
var f1 = function(x) {
console.log("first version got " + x);
};
function around(f, before, after) {
return function() {
before(); f.apply(null, arguments); after();
};
}
f1 = around(f1,
function(){console.log("added before");},
function(){console.log("added after");});
f1(12);
or Scheme:
(define (f1 x) (display "first version got ") (display x) (newline))
(define (around f before after)
(lambda x
(before) (apply f x) (after) ))
(set! f1 (around
f1
(lambda () (display "added before") (newline))
(lambda () (display "added after") (newline))))
(f1 12)
...or a whole host of other languages, because those are really rather common features. The operation (which I think is generally called "advice") is basically analogous to the ubiquitous x = x + 1, except the value is a function and the "addition" is the wrapping of extra operations around it to create a new functional value.
The reason this works is that by passing the old function in as a parameter (to around, or just a let or whatever) the new function is closing over it referred to through a locally scoped name; if the new function referred to the global name, the old value would be lost and the new function would just recurse.
Technically you could say this is a form of late binding - the function is being retrieved from a variable rather than being linked in directly - but generally the term is used to refer to much more dynamic behaviour, such as as JS field access where the field might not even actually exist. In the above case the compiler can at least be sure the variable f1 will exist, even if it turns out to hold null or something, so lookup is fast.
Other functions that call f1 would work the way you expect assuming that they reference it by that name. If you did var f3 = f1; before the around call, functions defined as calling f3 wouldn't be affected; similarly objects that got a hold of f1 by having it passed in as a parameter or something. Basic lexical scoping rule applies. If you want such functions to be affected too, you could pull it off using something like PicoLisp... but you're also doing something you probably shouldn't (and that's not any kind of binding any more: that's direct mutation of a function object).
All that aside, I'm not sure this is in the spirit of literate programming at all - or for that matter, a program that describes rules. Are rules supposed to change depending on how far you are through the book or in what order you read the chapters? Literate programs aren't - just as a paragraph of text usually means one thing (you may not understand it, but its meaning is fixed) no matter whether you read it first or last, so should a declaration in a true literate program, right? One doesn't normally read a reference - such as a book of rules - from cover to cover like a novel.
Whereas designed like this, the meaning of the program is highly dependent on being read with the statements in one specific order. It's very much a machine-friendly series-of-instructions... not so much a reference book.

Clojure methods ending in *

What do methods ending in * tend to have in common? I've seen a few, but have no idea if this is an established naming convention.
In general I've seen this used to distinguish functions that do the same thing but with different signatures, especially in situations where overloads would create conflicting semantics. For example, list* could not be expressed as an overload of list because they are using variable arity in different ways.
In many cases (but not all), the * form is called by the non-* version.
Apart from what other answers have mentioned, This convention is used where the non-* version are macros and these macros emit code that calls the * functions. Even in clojure.core, let and fn are macros whose resulting code calls let* and fn* functions respectively. Other example would be sqlkorma, where non-* (where,delete,update etc) are macros and * ones (where*, delete* etc) are functions.
The reason for using this pattern is that in some cases it is not feasible to use the macro version of the API (apart from using eval, as you don't have the information at compile time), in such cases you can uses the * based functions.

Clojure static typing

I know that this may sound like blasphemy to Lisp aficionados (and other lovers of dynamic languages), but how difficult would it be to enhance the Clojure compiler to support static (compile-time) type checking?
Setting aside the arguments for and against static and dynamic typing, is this possible (not "is this advisable")?
I was thinking that adding a new reader macro to force a compile-time type (an enhanced version of the #^ macro) and adding the type information to the symbol table would allow the compiler to flag places where a variables was misused. For example, in the following code, I would expect a compile-time error (#* is the "compile-time" type macro):
(defn get-length [#*String s] (.length s))
(defn test-get-length [] (get-length 2.0))
The #^ macro could even be reused with a global variable (*compile-time-type-checking*) to force the compiler the do the checks.
Any thoughts on the feasibility?
It certain possible. However I do not think that Clojure will ever get any form of weak static typing - it's benefits are too few.
Rich Hickey has however expressed on several occasions his like for the strong, optional, and expressive typing feature of the Qi language, http://www.lambdassociates.org/qilisp.htm
It's certainly possible. The compiler already does some static type checking around primitive argument types in the 1.3 development branch.
Yes! It looks like there is a project underway, core.typed, to make optional static type checking a reality. See the Github project and its
documentation
This work grew out of an undergraduate honours dissertation (PDF) by Ambrose Bonnaire-Sergeant, and is related to the Typed Racket system.
Since one form is read AND evaluated at a time you cannot have forward references making this somewhat limited.
Old question but two important points: I don't think Clojure supports reader macros, only ordinary lisp macros. And now we have core.typed option for typing in Clojure.
declare can have type hints, so it is possible to declare a var that "is" the type which has not been defined yet but contains data about the structure, but this would be really clunky and you would have to do it before any code path that could be executed before the type is defined. Basically, you would want to define all of your user defined types up front and then use them like normal. I think that makes library writing somewhat hackish.
I didn't mean to suggest earlier that this isn't possible, just that for user defined types it is a lot more complicated than for pre-defined types. The benefit of doing this vs. the cost is something that should be seriously considered. But I encourage anyone who is interested to try it out and see if they can make it work!