I am looking for a list of the allowed characters in a clojure keyword. Specifically I am interested to know if any of the following characters are allowed: - _ /.
I am not a java programmer, so I would not know the underlying ramifications if any. I don't know if the clojure keyword is mapped to a java keyword if there is such a thing.
Edit:
When I initially composed this answer, I was probably a little too heavily invested in the question of "what can you get away with?" In fairness to myself though, the keyword admissibility issue appears to be unsettled still. So:
First, a little about keywords, for new readers:
Keywords come in two flavours, qualified and unqualified. Unqualified keywords, like :foo, have no namespace component. Qualified keywords look like :foo/bar where the part prior to the slash is the namespace, ostensibly. Keywords can't be referred, and can be given a non-existent namespace, so their namespace behaviour is different from other Clojure objects.
Keywords can be created either by literals to the reader, like :foo, or by the keyword function, which is (keyword name-str) or (keyword ns name).
Keywords evaluate to themselves only, unlike symbols which point to vars. Note that keywords are not symbols.
What is officially permitted?
According to the reader documentation a single slash is permitted, a no periods in the name, and all rules to do with symbols.
What is actually permitted?
More or less anything but spaces seem to be permitted in the reader. For instance,
user> :-_./asdfgse/aser/se
:-_./asdfgse/aser/se
Appears to be legal. The namespace for the above keyword is:
user> (namespace :-_./asdfgse/aser/se)
"-_./asdfgse/aser"
So the namespace appears to consist of everything prior to the last forward slash.
The keyword function is even more permissive:
user> (keyword "////+" "/////")
:////+//////
user> (namespace (keyword "////+" "/////"))
"////+"
And similarly, spaces are fine too if you use the keyword function. I'm not sure exactly what limitations are placed on Unicode characters, but the REPL doesn't appear to complain when I put in arbitrary characters.
What's likely to happen in the future:
There have been some rumblings about validating keywords as they are interned. Supposedly one of the longest open clojure tickets is concerned with validation of keywords. So the keyword function may cease to be so permissive in the future, though that seems to be up in the air. See the assembla ticket and google group discussion.
The "correct" answer is documented:
Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? (other characters will be allowed eventually, but not all macro characters have been determined). '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g. my-namespace/foo. '/' by itself names the division function. '.' has special meaning - it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g. java.util.BitSet, or in namespace names. Symbols beginning or ending with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'. Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s.
Edit: And further with respect to keywords:
Keywords are like symbols, except:
* They can and must begin with a colon, e.g. :fred.
* They cannot contain '.' or name classes.
* A keyword that begins with two colons is resolved in the current namespace
From that list, the reader certainly allows - and _, but / has a special meaning as the delimiter between namespaces and symbol names. Period (which you didn't ask about) is problematic inside symbol names as well, since it is used in fully-qualified Java class names.
As far as Clojure idiom goes, - is your best friend in symbol names. It takes the place of camel case in Java or the underscore in Ruby.
starting in 1.3 you can use ' anywhere not starting a keyword. so :arthur's-keyword is allowed now :)
I use the keywords :-P and :-D to spice up my code occasionally (as synonyms for true and false)
Related
I am new to EDN and going through EDN spec - https://github.com/edn-format/edn
What is the use of EDN symbols like '$ % &' and how can I make use of them while reading EDN in Clojure?
The spec mentions Symbols, but they don't mean it as what say on a keyboard is referred to as symbols.
Its a bit confusing, so let me re-frame it. On a keyboard for example, someone might say that there are symbols, and those would be #, #, $, %, ^, &, among others. Lets call these character symbols.
Now in EDN, you have Symbols, but it's not the referring to a character symbol. It refers to a data-type. What's even more confusing, is that it mentions that an EDN Symbol can contain a certain set of character symbol, but it is not a character symbol.
So what are EDN symbols? Here's some:
hello
abc
+
person/name
this$is#insane
Each of these is a valid EDN Symbol. It helps to contrast them to understand them. so here are a bunch of EDN Strings:
"hello"
"abc"
"+"
"person/name"
"this$is#insane"
And here's a bunch of EDN keywords:
:hello
:abc
:+
:person/name
:this$is#insane
So what distinguishes these? Well you see, EDN Symbols, Strings and Keywords are all just a set of characters, depending if it is a Symbol, String or Keyword, the allowed characters differ, and for example, that's why the EDN spec says that a Symbol can contain certain characters like $ and ?. But it does not mention all characters, for example: ^ is not allowed in a Symbol, but it is in a String:
hello^john ; Not a valid EDN symbol
"hello^john" ; A valid EDN string
What else, you can see that an EDN string must have the set of characters enclosed between open and closing double quotes "". On the other hand, a keyword must have the set of characters starting with a colon :. And a symbol doesn't need any marker, any continuous set of valid characters are a symbol, as long as they don't begin with : or are enclosed in double quotes.
Now the second thing to understand is... what are they for? This is more nebulous. They are for whatever you want to use them for when you model your data as EDN. You could use EDN strings instead, or EDN keyword instead, and vice versa. Anytime you have a set of characters that only contain allowed symbol characters you could choose to use a symbol to represent them in EDN.
In general, people use keywords for keys of maps or for tagging, such as saying that the type of animal is :monkey:
{:animal-type :monkey}
And in general, string is used to represent free-form text. Text entered by a user, or needing to be displayed to a user.
{:animal-type :monkey
:animal-name "Bruno the monkey"}
Finally, Symbols are normally used to refer to other objects within the language itself. Such as referring to a function, another piece of data, etc.
{:animal-type :monkey
:animal-name "Bruno the monkey"
:transform-fn animal/add-owner-info}
I'd like to know which characters are safe for any use in SAS macros.
So what I mean by special characters here is any character (or group of characters) that can have a specific role in SAS in any context. I'm not that interested in keywords (made of a-z 1-9 chars).
For example = ^= ; % , # are special (not sure if # is actually used in SAS, but it's used for doc so still count as a parameter that is not 'safe for all uses').
But what about $ ! ~ § { } ° etc ?
This should include characters that are special in PROC SQL as well.
I'd like to use some of these characters and give them a special meaning in my code, but I'd rather not conflict with any existing use (I'm especially interested in ~).
A bit of general reference:
reserved macro
words
Macro word rules
SAS operators and mnemonics
Rules for SAS names
I think the vast majority of the characters on a standard English keyboard are used somewhere or other in the SAS language.
To address your examples:
$ Used in format names, put/input statements, regular expression definitions...
! 'or' operator in some environments
~ 'not' operator
§ Not used as far as I know
{} Can be used for data step array references & definitions
° Not used as far as I know
None of the above do anything special in a macro context, as Tom has already made clear in his answer.
Maybe SAS Operators in Expressions can help you for ~,
looking at the tables
Comparison Operators and
Logical Operators
The main triggers in macro code are & and % which are used to trigger macro variable references and macro statements, functions or macro calls.
The ; (semi-colon) is used in macro code (as in SAS code) to indicate the end of a statement.
For passing parameters into macro parameters you mainly need to worry about , (comma). But you will also want to avoid unbalanced (). You should avoid use = when passing parameter values by position.
You can protect them by adding quotes or extra () around the values. But those characters become part of the value passed. You can use macro quoting to protect them.
%mymac(parm1='1,200',parm2=(1,200),parm3=%str(1,200),parm4="a(b")
Equal signs can be included without quoting as long as your call is using named parameters.
%mymac(parm1=a=b)
In addition to the previous answers;
% is also used to include files in your program. %include.
Using special characters may cause your code to get stuck in a loop due to unbalanced quotes. SAS Note.
If you run into this just submit the magic string below:
*';*";*/;run;
I am relatively new to Clojure and can't quite wrap my mind around the difference between reader macros and regular macros, and why the difference is significant.
In what situations would you use one over the other and why?
Reader macros change the syntax of the language (for example, #foo turns into (deref foo)) in ways that normal macros can't (a normal macro wouldn't be able to get rid of the parentheses, so you'd have to do something like (# foo)). It's called a reader macro, because it's implemented in the read pass of the repl (check out the source).
As a clojure developer, you'll only create regular macros, but you'll use plenty of reader macros, without necessarily considering them explicitly.
The full list of reader macros is here: https://clojure.org/reference/reader and includes common things like # ', and #{}.
Clojure (unlike some other lisps) doesn't support user-defined reader macros, but there is some extensibility built into the reader via tagged literals (e.g. #inst or #uuid)
tl;dr*
Macros [normal macros] are expanded during evaluation (E of REPL), tied to symbols, operate on lisp objects, and appear in the first, or "function", part of a form. Clojure, and all lisps, allow defining new macros.
Reader macros run during reading, prior to evaluation, are single characters, operate on a character string, prior to all the lisp objects being emitted from the reader, and are not restricted to being in the first, or "function", part of a form. Clojure, unlike some other lisps, does not allow defining new reader macros, short of editing the Clojure compiler itself.
more words:
Normal non-reader macros, or just "macros", operate on lisp objects. Consider:
(and 1 b :x)
The and macro will be called with two values, one value is 1 and the other is a list consisting of the symbol b (not the value of b) and the keyword :x. Everything the and macro is dealing with is already a lisp (Clojure) value.
Macro expansion only happens when the macro is at the beginning of a list. (and 1 2) expands the and macro. (list and) returns an error, "Can't take value of a macro"
The reader is reasponsible for turning a character string into In Clojure a reader macro is a single character that changes how the reader, the part responsible for turning a text stream into lisp objects, operates. The dispatch for Clojure's lisp reader is in LispReader.java. As stated by Alejandro C., Clojure does not support adding reader macros.
Reader macros are one character. (I do not know if that is true for all lisps, but Clojure's current implementation only supports single character reader macros.)
Reader macros can exist at any point in the form. Consider (conj [] 'a) if the ' macro were normal, the tick would need to become a lisp object so the code wold be a list of the symbol conj, an empty vector, the symbol ' and finally the symbol a. But now the evaulation rules would require that ' be evaluated by itself. Instead the reader, upon seeing the ' wraps the complete s-exp that follows with quote so that the value returned to the evaluator is a list of conj, an empty vector, and a list of quote followed by a. Now quote is the head of a list and can change the evaluation rules for what it quotes.
Talking shortly, a reader macros is a low-level feature. That's why there are so few of them (just #, quiting and a bit more). Having to many reader rules will turn any language into a mess.
A regular macro is a tool that is widely used in Clojure. As a developer, you are welcome to write your own regular macroses but not reader ones if you are not a core Clojure developer.
Your may always use your own tagged literals as a substitution of reader rules, for example #inst "2017" will give you a Date instance and so forth.
I want to learn how to add spaces in variable names.
I know that a lot languages prevent me from doing this, but I believe that there is a trick to do this because I saw someone did it in MQL5
A MetaTrader Terminal allows to show a UI-Dialogue Panel for MMI-assisted setting values for input and extern variables declared in { Expert Advisor | Technical Indicator | Script } code, during a code-execution launch.
( Ref. a picture below ): .
If you really want to be evil you can sometimes use the left-to-right mark, U+200E which looks like a regular space but is generally not considered whitespace. Different languages and/or specific platforms may behave differently.
This trick seems to work in C# and apparently you can do similar things in ruby.
I tried this using g++ and luckily for everyone's sanity it is not allowed:
foo.cc:5:10: error: non-ASCII characters are not allowed outside of literals and identifiers
int a<U+200E> b = 3;
Please don't do this outside of pranks and April fool's day jokes.
In C++ you can't put spaces in variable names but you can get what you want using a std::map.
For example:
#include <map>
#include <string>
int main()
{
std::map<std::string, std::string> vars;
vars["Time Frame"] = "15 minutes";
vars["Indicator Period"] = "16";
// ... etc
}
The std::map is an associative container that maps one std::string onto another.
Depending on how you intend to use the map you may also want to consider using an std::unordered_map which should have higher performance but will not keep the keys sorted and may have a higher memory usage.
As much as I know, there isn't any option to add spaces to variables name.
The trouble with using spaces in names (whether filenames, variable names or something else) is that you need to have some other mechanism for determining what is part of this name and what is part of the next section of code. Your sample looks like a form, so that has it's own formatting and structure.
SQL does allow you to "quote" variable names with either [name with space] or with backticks `name with space`.
Most other languages do not allow spaces in variable names, because any whitedspace is considered a separator for different lexical unit [different name/word/variable]. There is no way you can change this, as it would alter the meaning of "normal code". Most languages do allow/use _ as a "space in names" character.
Of course, if you have "variables" that are your own construct, read from for example a database or file, you can use your own syntax, and use for example std::map<std::string, sometype> or std::unordered_map<std::string, sometype> to connect the string from your data to the corresponding value.
Spaces (white space) are used in C++ to isolate keywords and variable names, thus they cannot exist in a variable name or the compiler will treat the text as multiple identifiers.
Example - valid: static const unsigned int my_variable = 6U;
If there is a space between my and variable how does the compiler know which is the variable name? If there are two variables here, it doesn't make sense.
Also, as you can see, there may be more than one keyword in a statement.
I find a solution .In Mql5 , when you add a comment next to the variable name , it will appear instead of the variable name .
See this image : http://prntscr.com/79vaae
I am looking for the correct regex form to give to my Kiama Packrat Parser in order that when it encounters keywords like int it recognises this is a type, and not a valid var name.
At present I have :
lazy val type_int_ = ".*\\bint\\b.*".r ^^ (s => TypeInt)
lazy val var_ =
idn ^^ TermVar
lazy val idn =
"[a-zA-Z][a-zA-Z0-9]*".r
But this does not work, so I would appreciate pointers on this.
Many thanks
I've successfully used the following approach:
val keyword = regex ("int[^a-zA-Z]".r)
val identifier = not (keyword) ~> "[a-zA-Z]+".r
In other words, recognise the keyword only if it's not followed by a character that can extend it to be an identifier. A downside is that the extension regexp is repeated in both the keyword definition and the identifier one, but that can be factored out if you want.
You've got to be a bit careful how you use the keyword parser, since it captures the character after the keyword as well. It's safe in the context of a not, since no input is consumed.
Note that whitespace usually does not need to be handled explicitly since the literal and regex parser combinators take care of it before they start parsing for what you really want.
This approach is easy to generalise to multiple identifiers, by writing a method to build the keyword parser from a list of the keyword strings and the extension regular expression.
BTW, Kiama doesn't really provide parsing combinators. We rely on the ones in the Scala library. We do provide some extensions of the standard ones for special situations, but the basic behaviour is just straight from the library. Thus, it's not clear to me that your question actually relates to Kiama at all. As mentioned in the comments above, including a self-contained example of the problem would help us be clearer about exactly which library you are using.