When did the metadata reader syntax change from #^ to ^? - clojure

Currently (Clojure v1.6) you can give a type hint two ways:
^floats xs
#^floats xs
According to Clojure ^floats vs. #^floats?, the latter is legacy syntax, and the former is the current preferred form.
When did that change happen?

A brief history of the ^ macro character
In Clojure v1.0, the ^ character is the "meta reader macro". In other words, ^x was shorthand for (meta x). #^ was used to associate metadata with an object. (See the Macro Characters documentation from November 2009.)
At some point, someone probably realized that having special cases for both #^ and ^, both related to metadata, was confusing. They decided to deprecate ^, with the plan to eventually replace #^ with ^. In Clojure v1.1, the ^ reader macro was officially deprecated. (See the Macro Characters documentation from January 2010.)
There's a commit on April 26, 2010 on Github that replaces the old ^ behavior with the #^ behavior. (This is when #^ and ^ became synonymous.)
In the Clojure v1.2 release, #^ was deprecated in favor of ^. (See the Macro Characters documentation from August 2010.)
They removed the last few instances of #^ from clojure.core back in 2013, sometime before the Clojure v1.6 release.

Related

Implementation feature of character shorthand \s

I am wondering why in Erlang in the regex library re, the character shorthand \s only selects a whitespace (32 ASCII character) and is not the equivalent of [ \\t\\n\\r] regular expression.
At the same time, the "anti-pattern" for \s - \S(Non-space character shorthand) implements predictable behavior.
Test labs
EUnit tests for \s.
EUnit tests for [ \\t\\n\\r].
EUnit tests for \S.
I still found the answer to my question in the documentation of the re library.
For compatibility with Perl, \s did not used to match the VT character
(code 11), which made it different from the the POSIX "space" class.
However, Perl added VT at release 5.18, and PCRE followed suit at
release 8.34. The default \s characters are now HT (9), LF (10), VT
(11), FF (12), CR (13), and space (32), which are defined as white
space in the "C" locale. This list may vary if locale-specific
matching is taking place. For example, in some locales the
"non-breaking space" character (\xA0) is recognized as white space,
and in others the VT character is not.
From this, I conclude that the expected work with is possible so only if there is a set locale value - "C".
Now I understand why everything works this way - it was conceived by the developers, that is, we need to take this feature into account when implementing regular expressions in Erlang.
To overcome the implementation limitations (related to the need to take into account the locale value), I implemented a project to be able to adapt the regular expression text to the available capabilities of my software (my operating system does not have the required parameter set of locale, but I would like to continue using it).
This is a helper library re_tuner.

What is the difference between "regular" and "reader" macros?

I am relatively new to Clojure and can't quite wrap my mind around the difference between reader macros and regular macros, and why the difference is significant.
In what situations would you use one over the other and why?
Reader macros change the syntax of the language (for example, #foo turns into (deref foo)) in ways that normal macros can't (a normal macro wouldn't be able to get rid of the parentheses, so you'd have to do something like (# foo)). It's called a reader macro, because it's implemented in the read pass of the repl (check out the source).
As a clojure developer, you'll only create regular macros, but you'll use plenty of reader macros, without necessarily considering them explicitly.
The full list of reader macros is here: https://clojure.org/reference/reader and includes common things like # ', and #{}.
Clojure (unlike some other lisps) doesn't support user-defined reader macros, but there is some extensibility built into the reader via tagged literals (e.g. #inst or #uuid)
tl;dr*
Macros [normal macros] are expanded during evaluation (E of REPL), tied to symbols, operate on lisp objects, and appear in the first, or "function", part of a form. Clojure, and all lisps, allow defining new macros.
Reader macros run during reading, prior to evaluation, are single characters, operate on a character string, prior to all the lisp objects being emitted from the reader, and are not restricted to being in the first, or "function", part of a form. Clojure, unlike some other lisps, does not allow defining new reader macros, short of editing the Clojure compiler itself.
more words:
Normal non-reader macros, or just "macros", operate on lisp objects. Consider:
(and 1 b :x)
The and macro will be called with two values, one value is 1 and the other is a list consisting of the symbol b (not the value of b) and the keyword :x. Everything the and macro is dealing with is already a lisp (Clojure) value.
Macro expansion only happens when the macro is at the beginning of a list. (and 1 2) expands the and macro. (list and) returns an error, "Can't take value of a macro"
The reader is reasponsible for turning a character string into In Clojure a reader macro is a single character that changes how the reader, the part responsible for turning a text stream into lisp objects, operates. The dispatch for Clojure's lisp reader is in LispReader.java. As stated by Alejandro C., Clojure does not support adding reader macros.
Reader macros are one character. (I do not know if that is true for all lisps, but Clojure's current implementation only supports single character reader macros.)
Reader macros can exist at any point in the form. Consider (conj [] 'a) if the ' macro were normal, the tick would need to become a lisp object so the code wold be a list of the symbol conj, an empty vector, the symbol ' and finally the symbol a. But now the evaulation rules would require that ' be evaluated by itself. Instead the reader, upon seeing the ' wraps the complete s-exp that follows with quote so that the value returned to the evaluator is a list of conj, an empty vector, and a list of quote followed by a. Now quote is the head of a list and can change the evaluation rules for what it quotes.
Talking shortly, a reader macros is a low-level feature. That's why there are so few of them (just #, quiting and a bit more). Having to many reader rules will turn any language into a mess.
A regular macro is a tool that is widely used in Clojure. As a developer, you are welcome to write your own regular macroses but not reader ones if you are not a core Clojure developer.
Your may always use your own tagged literals as a substitution of reader rules, for example #inst "2017" will give you a Date instance and so forth.

What does the # (at sign) mean in Clojure?

I found this line of Clojure code: #(d/transact conn schema-tx). It's a Datomic statement that creates a database schema. I couldn't find anything relevant on Google due to difficulties searching for characters like "#".
What does the 'at' sign mean before the first parenthesis?
This is the deref macro character. What you're looking for in the context of Datomic is at:
http://docs.datomic.com/transactions.html
under Processing Transactions:
In Clojure, you can also use the deref method or # to get a
transaction's result.
For more on deref in Clojure, see:
http://clojuredocs.org/clojure_core/clojure.core/deref
Here is a useful overview of Clojure default syntax and "sugar" (i.e. macro definitions).
http://java.ociweb.com/mark/clojure/article.html#Overview
You'll find explained the number sign #, which indicates regex or hash map, the caret ^, which is for meta data, and among many more the "at sign" #. It is a sugar form for dereferencing, which means you get the real value the reference is pointing to.
Clojure has three reference types: Refs, Atoms and Agents.
http://clojure-doc.org/articles/language/concurrency_and_parallelism.html#clojure-reference-types
Your term #(d/transact conn schema-tx) seems to deliver a reference to an atom, and by the at sign # you defer and thus get the value this reference points to.
BTW, you'll find results with search engines if you look e.g. for "Clojure at sign". But it needs some patience ;-)
The # is equivalent to deref in Clojure. transact returns a future which you deref to get the result. deref/# will block until the the transaction completes/aborts/times out.

Regexp languages and replacements in Emacs

When I use the regexp-builder, I need to escape things in a different way from the way I do it when using replace-regexp. Now, this thread explains that these two commands use a different syntax, but why is that?
Also, I went through this blog post: Re-builder: The Interactive Regexp Builder, and I added
(require 're-builder)
(setq reb-re-syntax 'string)
to my .emacs file following the advice on the site. However, I still need to type " around my regexp to make it work. I thought changing the syntax language would take care of this but it doesn't.
With this, my actual questions are:
Is it sill the case that Emacs does not support PCRE? Are there any workarounds to this?
Once I have the right regexp in regex-builder, is there any way to directly send the regexp to replace-regexp and enter the replacement string?
There's a package in the MELPA repository called pcre2el that adds PCRE support to many parts of Emacs, including regexp-builder and replace-regexp.
Regarding question #2: No (at least not by default), but there's another way to do that without using re-builder.
Start by doing a regexp isearch for your pattern. Because it's an isearch, you'll see the matches interactively, a bit like re-builder (albeit without coloured groupings).
Still in isearch, once you're happy with the pattern, type C-M-% to call isearch-query-replace-regexp which will prompt you for the replacement.
You can of course simply copy your re-builder string from its buffer and yank it as a replacement string (but that's undoubtedly not news).
I was curious about the need for quotes in re-builder with string syntax. It seems that's it's just a formality of the system, and reb-read-regexp returns everything between the first and last " when using that syntax. Maybe it's intended to ensure that leading or trailing whitespace can't confuse matters -- re-builder does use leading whitespace for improved visibility, and trailing whitespace would be harder to spot. Or maybe it just made some of the code more convenient/consistent.
No, Emacs doesn't support PCRE, and as far as I know there is no work-around for that.
I don't think so.
To answer your first question, why does re-builder use a different syntax than replace-regexp:
By default, re-builder uses the syntax that is appropriate for writing elisp programs. In the context of a written program, regexps are entered within strings. Inside a string, backslashes have a special meaning which conflicts with using the backslash as part of a regexp. Consequently, within a string, you need to double a backslash to use it to signify part of the regexp syntax.
replace-regexp, on the other hand, is designed to be used interactively by the user, and it explicitly expects the input to be a regexp. As a convenience, it interprets backslashes as regexp syntax, not as string escapes. Which is why you can use single backslashes in this context.

What are the allowed characters in a Clojure keyword?

I am looking for a list of the allowed characters in a clojure keyword. Specifically I am interested to know if any of the following characters are allowed: - _ /.
I am not a java programmer, so I would not know the underlying ramifications if any. I don't know if the clojure keyword is mapped to a java keyword if there is such a thing.
Edit:
When I initially composed this answer, I was probably a little too heavily invested in the question of "what can you get away with?" In fairness to myself though, the keyword admissibility issue appears to be unsettled still. So:
First, a little about keywords, for new readers:
Keywords come in two flavours, qualified and unqualified. Unqualified keywords, like :foo, have no namespace component. Qualified keywords look like :foo/bar where the part prior to the slash is the namespace, ostensibly. Keywords can't be referred, and can be given a non-existent namespace, so their namespace behaviour is different from other Clojure objects.
Keywords can be created either by literals to the reader, like :foo, or by the keyword function, which is (keyword name-str) or (keyword ns name).
Keywords evaluate to themselves only, unlike symbols which point to vars. Note that keywords are not symbols.
What is officially permitted?
According to the reader documentation a single slash is permitted, a no periods in the name, and all rules to do with symbols.
What is actually permitted?
More or less anything but spaces seem to be permitted in the reader. For instance,
user> :-_./asdfgse/aser/se
:-_./asdfgse/aser/se
Appears to be legal. The namespace for the above keyword is:
user> (namespace :-_./asdfgse/aser/se)
"-_./asdfgse/aser"
So the namespace appears to consist of everything prior to the last forward slash.
The keyword function is even more permissive:
user> (keyword "////+" "/////")
:////+//////
user> (namespace (keyword "////+" "/////"))
"////+"
And similarly, spaces are fine too if you use the keyword function. I'm not sure exactly what limitations are placed on Unicode characters, but the REPL doesn't appear to complain when I put in arbitrary characters.
What's likely to happen in the future:
There have been some rumblings about validating keywords as they are interned. Supposedly one of the longest open clojure tickets is concerned with validation of keywords. So the keyword function may cease to be so permissive in the future, though that seems to be up in the air. See the assembla ticket and google group discussion.
The "correct" answer is documented:
Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? (other characters will be allowed eventually, but not all macro characters have been determined). '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g. my-namespace/foo. '/' by itself names the division function. '.' has special meaning - it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g. java.util.BitSet, or in namespace names. Symbols beginning or ending with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'. Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s.
Edit: And further with respect to keywords:
Keywords are like symbols, except:
* They can and must begin with a colon, e.g. :fred.
* They cannot contain '.' or name classes.
* A keyword that begins with two colons is resolved in the current namespace
From that list, the reader certainly allows - and _, but / has a special meaning as the delimiter between namespaces and symbol names. Period (which you didn't ask about) is problematic inside symbol names as well, since it is used in fully-qualified Java class names.
As far as Clojure idiom goes, - is your best friend in symbol names. It takes the place of camel case in Java or the underscore in Ruby.
starting in 1.3 you can use ' anywhere not starting a keyword. so :arthur's-keyword is allowed now :)
I use the keywords :-P and :-D to spice up my code occasionally (as synonyms for true and false)