What is the difference between "regular" and "reader" macros? - clojure

I am relatively new to Clojure and can't quite wrap my mind around the difference between reader macros and regular macros, and why the difference is significant.
In what situations would you use one over the other and why?

Reader macros change the syntax of the language (for example, #foo turns into (deref foo)) in ways that normal macros can't (a normal macro wouldn't be able to get rid of the parentheses, so you'd have to do something like (# foo)). It's called a reader macro, because it's implemented in the read pass of the repl (check out the source).
As a clojure developer, you'll only create regular macros, but you'll use plenty of reader macros, without necessarily considering them explicitly.
The full list of reader macros is here: https://clojure.org/reference/reader and includes common things like # ', and #{}.
Clojure (unlike some other lisps) doesn't support user-defined reader macros, but there is some extensibility built into the reader via tagged literals (e.g. #inst or #uuid)

tl;dr*
Macros [normal macros] are expanded during evaluation (E of REPL), tied to symbols, operate on lisp objects, and appear in the first, or "function", part of a form. Clojure, and all lisps, allow defining new macros.
Reader macros run during reading, prior to evaluation, are single characters, operate on a character string, prior to all the lisp objects being emitted from the reader, and are not restricted to being in the first, or "function", part of a form. Clojure, unlike some other lisps, does not allow defining new reader macros, short of editing the Clojure compiler itself.
more words:
Normal non-reader macros, or just "macros", operate on lisp objects. Consider:
(and 1 b :x)
The and macro will be called with two values, one value is 1 and the other is a list consisting of the symbol b (not the value of b) and the keyword :x. Everything the and macro is dealing with is already a lisp (Clojure) value.
Macro expansion only happens when the macro is at the beginning of a list. (and 1 2) expands the and macro. (list and) returns an error, "Can't take value of a macro"
The reader is reasponsible for turning a character string into In Clojure a reader macro is a single character that changes how the reader, the part responsible for turning a text stream into lisp objects, operates. The dispatch for Clojure's lisp reader is in LispReader.java. As stated by Alejandro C., Clojure does not support adding reader macros.
Reader macros are one character. (I do not know if that is true for all lisps, but Clojure's current implementation only supports single character reader macros.)
Reader macros can exist at any point in the form. Consider (conj [] 'a) if the ' macro were normal, the tick would need to become a lisp object so the code wold be a list of the symbol conj, an empty vector, the symbol ' and finally the symbol a. But now the evaulation rules would require that ' be evaluated by itself. Instead the reader, upon seeing the ' wraps the complete s-exp that follows with quote so that the value returned to the evaluator is a list of conj, an empty vector, and a list of quote followed by a. Now quote is the head of a list and can change the evaluation rules for what it quotes.

Talking shortly, a reader macros is a low-level feature. That's why there are so few of them (just #, quiting and a bit more). Having to many reader rules will turn any language into a mess.
A regular macro is a tool that is widely used in Clojure. As a developer, you are welcome to write your own regular macroses but not reader ones if you are not a core Clojure developer.
Your may always use your own tagged literals as a substitution of reader rules, for example #inst "2017" will give you a Date instance and so forth.

Related

Abstract structure of Clojure

I've been learning Clojure and am a good way through a book on it when I realized how much I'm still struggling to interpret the code. What I'm looking for is the abstract structure, interface, or rules, Clojure uses to parse code. I think it looks something like:
(some-operation optional-args)
optional-args can be nearly anything and that's where I start getting confused.
(operation optional-name-string [vector of optional args]) would equal (defn newfn [argA, argB])
I think this pattern holds for all lists () but with so much flexibility and variation in Clojure, I'm not sure. It would be really helpful to see the rules the interpreter follows.
You are not crazy. Sure it's easy to point out how "easy" ("simple"? but that another discussion) Clojure syntax is but there are two things for a new learner to be aware of that are not pointed out very clearly in beginning tutorials that greatly complicate understanding what you are seeing:
Destructuring. Spend some quality time with guides on destructuring in Clojure. I will say that this adds a complexity to the language and is not dissimilar from "*args" and "**kwargs" arguments in Python or from the use of the "..." spread operator in javascript. They are all complicated enough to require some dedicated time to read. This relates to the optional-args you reference above.
macros and metaprogramming. In the some-operation you reference above, you wish to "see the rules the interpreter follows". In the majority of the cases it is a function but Clojure provides you no indication of whether you are looking at a function or a macro. In the standard library, you will just need to know some standard macros and how they affect the syntax they headline. (e.g. if, defn etc). For included libraries, there will typically be a small set of macros that are core to understanding that library. Any macro will to modify, dare I say, complicate the syntax in the parens you are looking at so be on your toes.
Clojure is fantastic and easy to learn but those two points are not to be glossed over IMHO.
Before you start coding with Clojure, I highly recommend studying functional programming and LISB. In Clojure, everything is a prefix, and when you want to run and specific function, you will call it and then feed it with some arguments. for example, 1+2+3 will be (+ 1 2 3) in Clojure. In other words, every function you call will be at the start of a parenthesis, and all of its arguments will be follows the function name.
If you define a function, you may do as follow:
(defn newfunc [a1 a2]
(+ 100 a1 a2))
Which newfunc add 100 and a1 and a2. When you call it, you should do this:
(newfunc 1 2)
and the result will be 103.
in the first example, + is a function, so we call it at the beginning of the parenthesis.
Clojure is a beautiful world full of simplicity. Please learn it deeply.

clojure code modification preserving reader macros

In clojure, read-string followed by str will not return the original string, but the string for which the reader macros have been expanded:
(str (read-string "(def foo [] #(bar))"))
;"(def foo [] (fn* [] (bar)))"
This is problematic if we want to manipulate a small part of the code, far away from any reader macros, and get back a string representation that preserves the reader macros. Is there a work around?
The purpose of read is to build an AST of the code and as such the function does not preserve all the properties of the original text. Otherwise, it should keep track of the original code-layout (e.g. location of parenthesis, newlines, indentation (tabs/spaces), comments, and so on). If you have a look at LispReader.java, you can see that the reader macros are unconditionally applied (*read-eval* does not influence all reader macros).
Here is what I would recommend:
You can take inspiration of the existing LispReader and implement your own reader. Maybe it is sufficient to change the dispatch-table so that macro characters are directed to you own reader. You would also need to build runtime representations of the quoted forms and provide adequate printer functions for those objects.
You can process your original file with Emacs lisp, which can easily navigate the structure of your code and edit it as you wish.
Remark: you must know that what you are trying to achieve smells fishy. You might have a good reason to want to do this, but this looks needlessly complex without knowing why you want to work at the syntax level. It would help if you could provide more details.

Avoid name clashes in a Clojure DSL

As a side project I'm creating a Clojure DSL for image synthesis (clisk).
I'm a little unsure on the best approach to function naming where I have functions in the DSL that are analogous to functions in Clojure core, for example the + function or something similar is needed in my DSL to additively compose images / perform vector maths operations.
As far as I can see it there are a few options:
Use the same name (+) in my own namespace. Looks nice in DSL code but will override the clojure.core version, which may cause issues. People could get confused.
Use the same name but require it to be qualified (my-ns/+). Avoids conflicts, but prevents people from useing the namespace for convenience and looks a bit ugly.
Use a different short name e.g. (v+). Can be used easily and avoid clashes, but the name is a bit ugly and might prove hard to remember.
Use a different long name e.g. (vector-add). Verbose but descriptive, no clashes.
Exclude clojure.core/+ and redefine with a multimethod + (as georgek suggests).
Example code might look something like:
(show (v+ [0.9 0.6 0.3]
(dot [0.2 0.2 0]
(vgradient (vseamless 1.0 plasma) ))))
What is the best/most idiomatic approach?
first, the repeated appearance of operators in an infix expression requires a nice syntax, but for a lisp, with prefix syntax, i don't think this is as important. so it's not such a crime to have the user type a few more characters for an explicit namespace. and clojure has very good support for namespaces and aliasing. so it's very easy for a user to select their own short prefix: (x/+ ...) for example.
second, looking at the reader docs there are not many non-alphanumeric symbols to play with, so something like :+ is out. so there's no "cute" solution - if you choose a prefix it's going to have to be a letter. that means something like x+ - better to let the user choose an alias, at the price of one more character, and have x/+.
so i would say: ignore core, but expect the user to (:require .... :as ...). if they love your package so much they want it to be default then they can (:use ...) and handle core explicitly. but you choosing a prefix to operators seems like a poor compromise.
(and i don't think i have seen any library that does use single letter prefixes).
one other possibility is to provide the above and also a separate package with long names instead of operators (which are simply def'ed to match the values in the original package). then if people do want to (:use ...) but want to avoid clashes, they can use that (but really what's the advantage of (vector-add ...) over (vector/+ ...)?)
and finally i would check how + is implemented, since if it already involves some kind of dispatch on types then georgek's comment makes a lot of sense.
(by "operator" above i just mean single-character, non-alphanumeric symbol)

What would Clojure lose by switching away from leading parenthesis like Dylan, Julia and Seph?

Three lispy homoiconic languages, Dylan, Julia and Seph all moved away from leading parenthesis - so a hypothetical function call in Common Lisp that would look like:
(print hello world)
Would look like the following hypothetical function call
print(hello world)
in the three languages mentioned above.
Were Clojure to go down this path - what would it have to sacrifice to get there?
Reasoning:
Apart from the amazing lazy functional data structures in Clojure, and the improved syntax for maps and seqs, the language support for concurrency, the JVM platform, the tooling and the awesome community - the distinctive thing about it being 'a LISP' is leading parenthesis giving homoiconicity which gives macros providing syntax abstraction.
But if you don't need leading parentheses - why have them? The only arguments I can think of for keeping them are
(1) reusing tool support in emacs
(2) prompting people to 'think in LISP' and not try and treat it as another procedural language)
(Credit to andrew cooke's answer, who provided the link to Wheeler's and Gloria's "Readable Lisp S-expressions Project")
The link above is a project intended to provide a readable syntax for all languages based on s-expressions, including Scheme and Clojure. The conclusion is that it can be done: there's a way to have readable Lisp without the parentheses.
Basically what David Wheeler's project does is add syntactic sugar to Lisp-like languages to provide more modern syntax, in a way that doesn't break Lisp's support for domain-specific languages. The enhancements are optional and backwards-compatible, so you can include as much or as little of it as you want and mix them with existing code.
This project defines three new expression types:
Curly-infix-expressions. (+ 1 2 3) becomes {1 + 2 + 3} at every place you want to use infix operators of any arity. (There is a special case that needs to be handled with care if the inline expression uses several operators, like {1 + 2 * 3} - although {1 + {2 * 3} } works as expected).
Neoteric-expressions. (f x y) becomes f(x y) (requires that no space is placed between the function name and its parameters)
Sweet-expressions. Opening and closing parens can be replaced with (optional) python-like semantic indentation. Sweet-expressions can be freely mixed with traditional parentheses s-expressions.
The result is Lisp-compliant but much more readable code. An example of how the new syntactic sugar enhances readability:
(define (gcd_ a b)
(let (r (% b a))
(if (= r 0) a (gcd_ r a))))
(define-macro (my-gcd)
(apply gcd_ (args) 2))
becomes:
define gcd_(a b)
let r {b % a}
if {r = 0} a gcd_(r a)
define-macro my-gcd()
apply gcd_ (args) 2
Note how the syntax is compatible with macros, which was a problem with previous projects that intended to improve Lisp syntax (as described by Wheeler and Gloria). Because it's just sugar, the final form of each new expression is a s-expression, transformed by the language reader before macros are processed - so macros don't need any special treatment. Thus the "readable Lisp" project preserves homoiconicity, the property that allows Lisp to represent code as data within the language, which is what allows it to be a powerful meta-programming environment.
Just moving the parentheses one atom in for function calls wouldn't be enough to satisfy anybody; people will be complaining about lack of infix operators, begin/end blocks etc. Plus you'd probably have to introduce commas / delimiters in all sorts of places.
Give them that and macros will be much harder just to write correctly (and it would probably be even harder to write macros that look and act nicely with all the new syntax you've introduced by then). And macros are not something that's a nice feature you can ignore or make a whole lot more annoying; the whole language (like any other Lisp) is built right on top of them. Most of the "user-visible" stuff that's in clojure.core, including let, def, defn etc are macros.
Writing macros would become much more difficult because the structure would no longer be simple you would need another way to encode where expressions start and stop using some syntactic symbol to mark the start and end of expressions to you can write code that generates expressions perhaps you could solve this problem by adding something like a ( to mark the start of the expression...
On a completely different angle, it is well worth watching this video on the difference between familiar and easy making lisps syntax more familiar wont make it any easier for people to learn and may make it misleading if it looks to much like something it is not.
even If you completely disagree, that video is well worth the hour.
you wouldn't need to sacrifice anything. there's a very carefully thought-out approach by david wheeler that's completely transparent and backwards compatible, with full support for macros etc.
You would have Mathematica. Mathematica accepts function calls as f[x, y, z], but when you investigate it further, you find that f[x, y, z] is actually syntactic sugar for a list in which the Head (element 0) is f and the Rest (elements 1 through N) is {x, y, z}. You can construct function calls from lists that look a lot like S-expressions and you can decompose unevaluated functions into lists (Hold prevents evaluation, much like quote in Lisp).
There may be semantic differences between Mathematica and Lisp/Scheme/Clojure, but Mathematica's syntax is a demonstration that you can move the left parenthesis over by one atom and still interpret it sensibly, build code with macros, etc.
Syntax is pretty easy to convert with a preprocessor (much easier than semantics). You could probably get the syntax you want through some clever subclassing of Clojure's LispReader. There's even a project that seeks to solve Clojure's off-putting parentheses by replacing them all with square brackets. (It boggles my mind that this is considered a big deal.)
I used to code C/C#/Java/Pascal so I emphasize with the feeling that Lisp code is a bit alien. However that feeling only lasts a few weeks - after a fairly short amount of time the Lisp style will feel very natural and you'll start berating other languages for their "irregular" syntax :-)
There is a very good reason for Lisp syntax. Leading parentheses make code logically simpler to parse and read, by collecting both a function and the expressions that make up it's arguments in a single form.
And when you manipulate code / use macros, it is these forms that matter: these are the building blocks of all your code. So it fundamentally makes sense to put the parentheses in a place that exactly delimits these forms, rather than arbitrarily leaving the first element outside the form.
The "code is data" philosophy is what makes reliable metaprogramming possible. Compare with Perl / Ruby, which have complex syntax, their approaches to metaprogramming are only reliable in very confined circumstances. Lisp metaprogramming is so perfectly reliable that the core of the language depends on it. The reason for this is the uniform syntax shared by code and data (the property of homoiconicity). S-expressions are the way this uniformity is realized.
That said, there are circumstances in Clojure where implied parenthesis is possible:
For example the following three expressions are equivalent:
(first (rest (rest [1 2 3 4 5 6 7 8 9])))
(-> [1 2 3 4 5 6 7 8 9] (rest) (rest) (first))
(-> [1 2 3 4 5 6 7 8 9] rest rest first)
Notice that in the third the -> macro is able in this context to infer parentheses and thereby leave them out. There are many special case scenarios like this. In general Clojure's design errs on the side of less parentheses. See for example the controversial decision of leaving out parenthesis in cond. Clojure is very principled and consistent about this choice.
Interestingly enough - there is an alternate Racket Syntax:
#foo{blah blah blah}
reads as
(foo "blah blah blah")
Were Clojure to go down this path - what would it have to sacrifice to get there?
The sacrifice would the feeling/mental-modal of writing code by creating List of List of List.. or more technically "writing the AST directly".
NOTE: There are other things as well that will be scarified as mentioned in other answers.

Conventions, Style, and Usage for Clojure Constants?

What are the best practices for defining constants in Clojure in terms of style, conventions, efficiency, etc.
For example, is this right?
(def *PI* 3.14)
Questions:
Should constants be capitalized in Clojure?
Stylistically, should they have the asterisk (*) character on one or both sides?
Any computational efficiency considerations I should be aware of?
I don't think there is any hard and fast rules. I usually don't give them any special treatment at all. In a functional language, there is less of a distinction between a constant and any other value, because things are more often pure.
The asterisks on both sides are called "ear muffs" in Clojure. They are usually used to indicate a "special" var, or a var that will be dynamically rebound using binding later. Stuff like out and in which are occasionally rebound to different streams by users and such are examples.
Personally, I would just name it pi. I don't think I've ever seen people give constants special names in Clojure.
EDIT: Mister Carper just pointed out that he himself capitalizes constants in his code because it's a convention in other languages. I guess this goes to show that there are at least some people who do that.
I did a quick glance through the coding standards but didn't find anything about it. This leads me to conclude that it's really up to you whether or not you capitalize them. I don't think anyone will slap you for it in the long run.
On the computational efficiency front you should know there is no such thing as a global constant in Clojure. What you have above is a var, and every time you reference it, it does a lookup. Even if you don't put earmuffs on it, vars can always be rebound, so the value could always change, so they are always looked up in a table. For performance critical loops this is most decidedly non-optimal.
There are some options like putting a let block around your critical loops and let the value of any "constant" vars so that they are not looked up. Or creating a no-arg macro so that the constant value is compiled into the code. Or you could create a Java class with a static member.
See this post, and the following discussion about constants for more info:
http://groups.google.com/group/clojure/msg/78abddaee41c1227
The earmuffs are a way of denoting that a given symbol will have its own thread-local binding at some point. As such, it does not make sense to apply the earmuffs to your Pi constant.
*clojure-version* is an example of a constant in Clojure, and it's entirely in lower-case.
Don't use a special notation for constants; everything is assumed a constant unless specified otherwise.
See http://dev.clojure.org/display/community/Library+Coding+Standards
Clojure has a variety of literals such as:
3.14159
:point
{:x 0
:y 1}
[1 2 3 4]
#{:a :b :c}
The literals are constant. As far as I know, there is no way to define new literals. If you want to use a new constant, you can effectively generate a literal in the code at compile-time:
(defmacro *PI* [] 3.14159265358979323)
(prn (*PI*))
In Common Lisp, there's a convention of naming constants with plus signs (+my-constant+), and in Scheme, by prefixing with a dollar sign ($my-constant); see this page. Any such convention conflicts with the official Clojure coding standards, linked in other answers, but maybe it would be reasonable to want to distinguish regular vars from those defined with the :const attribute.
I think there's an advantage to giving non-function variables of any kind some sort of distinguishing feature. Suppose that aside from variables defined to hold functions, you typically only use local names defined by function parameters, let, etc. If you nevertheless occasionally define a non-function variable using def, then when its name appears in a function definition in the same file, it looks to the eye like a local variable. If the function is complex, you may spend several seconds looking for the name definition within the function. Adding a distinguishing feature like earmuffs or plus signs or all uppercase, as appropriate to the variable's use, makes it obvious that the variable's definition is somewhere else.
In addition, there are good reasons to give special constants like pi a special name, so no one has to wonder whether pi means, say, "print-index", or the i-th pizza, or "preserved interface". Of course I think those variables should have more informative names, but lots of people use cryptic, short variable names, and I end up reading their code. I shouldn't have to wonder whether pi means pi, so something like PI might make sense. None would think that's a run of the mill variable in Clojure.
According to the "Practical Clojure" book, it should be named *pi*