Can anyone explain or point me to where I can find clojure's naming conventions for:
File names
Functions (From what I understand, function names are simply dash separated values)
Variables
You might want to look at the Clojure library coding standards on the developer Wiki - this is probably the most comprehensive list that I've seen.
Update: link above seems to be dead, consider instead: https://clojure.org/dev/contrib_howto#_coding_guidelines
To your specific points:
File names are lowercase, and stored in a directory structure to match the namespace, and end in .clj e.g. "my/special/namespace.clj
Functions are dash-separated-lowercase-words, ideally descriptively chosen so that your code is clear and self-documenting. Don't be afraid to re-use good function names in different namespaces (that is what namespaces are for!).
Variables (by which I assume you mean parameters, let-bound variables etc.) are also usually dash-separated-lowercase-words. Since code-is-data, I think it is appropriate that functions and data have the same naming convention :-)
You might want to take a look at this non official style guide.
There are some interesting guidelines on naming written by Stuart Sierra which suggest that:
pure functions should be nouns describing the return value (age instead of calculate-age)
side-effecting functions should be verbs describing the action (create- for constructing and get- for retrieving), reserving the bang swap! changes to mutable references.
verbs that can also be nouns should be distinguished as verb phrases (send-message instead of message)
coercions should name the output type without an arrow prefix (connection instead of ->connection) except when the input type must be explicit (input-type->output-type)
namespace aliases can save on repetition (products/price instead of products/product-price) and prevent local clashes in let bindings
functions returning functions should have the -fn suffix
There is an interesting set of naming conventions documented in a comment by
Taoensso in his
Encore library.
He proposes names using ! for side-effects, ? for booleans,
$ for expensive operations, _ as dereffable,
* for macros; plus a few other combos.
Even though you didn't ask for it explicitly, I'll explain what I've seen for protocol naming conventions.
Typically, the name starts with an uppercase "I" and then the rest is camel case, where the first letter of each word is capitalized, and the rest is lower case. For example, I want to define a protocol for rocket ships, I'd use the name IRocketShip
I've also seen 'A' instead of 'I' used, probably to represent the word 'abstract'.
Related
My Problem
I'm currently writing a REST-API which is supposed to take JSON requests and work with an intern library we use. The main usage will be to either run the server with a web interface or to possibly use another language to work with the API since Clojure isn't common elsewhere.
In order to achieve this, the JSON request contains data and a functionname, which is run with resolve, since I'm supposed to make it so that we don't have to change the API each time a function is added/removed.
Now the actual question is: How can I make sure the function I run combined with it's argument dosen't destroy the whole thing?
So, what did I try already?
Now, I've actually told only half the truth until now: I don't use resolve, I use ns-resolve. My first intuition was to create a seperate file which will load in all namespaces from the library, there's nothing malicious you could do with those. The problem is, I want only those functions and I'm not aware of any way to remove clojure.core functions. I could do a blacklist for those but whitelisting would be a whole lot easier. Not to mention I could never find all core functions I actually should be blacklisting.
The oher thing is the input.
Again I've got a basic idea which is to sanitize the input to replace all sort of brackets just to make sure the input isn't other clojure code which would just bypass the namespace restriction from above. But would this actually be enough? I've got not much experience in breaking things.
Another concern I've heard is that some functions could run the input as argument long before intended. The server works with ring and its JSON extension.
JSON should only give strings, numbers, booleans and nil as atomic data types. I conclude each possible malicious input should be a string at my end - besides resolve, is there any function which could have the side effect of running such input?
Since they are string: Is there even a concern to be had with the data at all?
I would strongly advise to use a whitelisting approach for functions, and not to evaluate anything else.
You could maybe add a metadata flag to the exposed functions that you check where you resolve them.
Everything else should just be data; don't evaluate it.
Probably you want to look into the following:
How to determine public functions from a given namespace. This will give you a list of the valid functions names that your API can accept as part of the input. Here's a sample:
user=> (ns-publics (symbol "clojure.string"))
{ends-with? #'clojure.string/ends-with?, capitalize #'clojure.string/capitalize, reverse #'clojure.string/reverse, join #'clojure.string/join, replace-first #'clojure.string/replace-first, starts-with? #'clojure.string/starts-with?, escape #'clojure.string/escape, last-index-of #'clojure.string/last-index-of, re-quote-replacement #'clojure.string/re-quote-replacement, includes? #'clojure.string/includes?, replace #'clojure.string/replace, split-lines #'clojure.string/split-lines, lower-case #'clojure.string/lower-case, trim-newline #'clojure.string/trim-newline, upper-case #'clojure.string/upper-case, split #'clojure.string/split, trimr #'clojure.string/trimr, index-of #'clojure.string/index-of, trim #'clojure.string/trim, triml #'clojure.string/triml, blank? #'clojure.string/blank?}
You probably want to use the keys from the map above (in the namespace that applies to your use case) to validate the input, because you can "escape" the ns-resolve namespace if you fully qualify the function name:
user=> ((ns-resolve (symbol "clojure.string") (symbol "reverse")) "hello")
"olleh"
user=> ((ns-resolve (symbol "clojure.string") (symbol "clojure.core/reverse")) "hello")
(\o \l \l \e \h) ;; Called Clojure's own reverse, probably you don't want to allow this
Now, with that being said, I'm going to offer you some free advice:
I'm supposed to make it so that we don't have to change the API each time a function is added/removed
If you have watched some of Rich Hickey's talks you'll know that API changes are a sensible topic. In general you should think carefully before adding new functions or thinking of deleting any, because it sounds like your team is willing to cut corners on getting clients of the API together on the same page.
Unless your clients can discover dynamically what functions are available (maybe you'll expose some API?), it sounds like you will be open to receiving requests you cannot fulfill because the functions have changed or could be removed.
I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.
They say that use exclamation marks when naming impure functions.
But I don't exactly understand the "impure" functions. Are they
functions change state of their arguments (via reset!, alter, java-object-methods, ...)
functions occur side-effect (for example, print, spit, ...)
or both?
Obviously, official clojure apis don't have bang!s on every case above. I wonder when should I put them and need your help to make my code saner.
I would say you don't need to put ! on every impure function. Community Clojure Style Guide recommends:
The names of functions/macros that are not safe in STM transactions
should end with an exclamation mark.
So, basically, end with ! functions that change state for atoms, metadata, vars, transients, agents and io as well.
Thanks to #noisesmith for update.
Here is my article answering your question https://clojure.wladyka.eu/posts/when-use-exclamation-mark/
In simple explanation the rule is like that:
(create-user! ...) has additional effects if you run it more than once with the same input. For example send e-mail each time or create +1 user.
(create-user ...) wouldn’t have additional effects even if you run this many times with the same input.
If it is still not clear think about this:
(create-user! ...) vs (create-user-only-if-not-exist ...).
A little something that could be borrowed from IDEs. So the idea would be to highlight function arguments (and maybe scoped variable names) inside function bodies. This is the default behaviour for some C:
Well, if I were to place the cursor inside func I would like to see the arguments foo and bar highlighted to follow the algorithm logic better. Notice that the similarly named foo in func2 wouldn't get highlit. This luxury could be omitted though...
Using locally scoped variables, I would also like have locally initialized variables highlit:
Finally to redemonstrate the luxury:
Not so trivial to write this. I used the C to give a general idea. Really I could use this for Scheme/Clojure programming better:
This should recognize let, loop, for, doseq bindings for instance.
My vimscript-fu isn't that strong; I suspect we would need to
Parse (non-regexply?) the arguments from the function definition under the cursor. This would be language specific of course. My priority would be Clojure.
define a syntax region to cover the given function/scope only
give the required syntax matches
As a function this could be mapped to a key (if very resource intensive) or CursorMoved if not so slow.
Okay, now. Has anyone written/found something like this? Do the vimscript gurus have an idea on how to actually start writing such a script?
Sorry about slight offtopicness and bad formatting. Feel free to edit/format. Or vote to close.
This is much harder than it sounds, and borderline-impossible with the vimscript API as it stands, because you don't just need to parse the file; if you want it to work well, you need to parse the file incrementally. That's why regular syntax files are limited to what you can do with regexes - when you change a few characters, vim can figure out what's changed in the syntax highlighting, without redoing the whole file.
The vim syntax highlighter is limited to dealing with regexes, but if you're hellbent on doing this, you can roll your own parser in vimscript, and have it generate a buffer-local syntax that refers to tokens in the file by line and column, using the \%l and \%c atoms in a regex. This would have to be rerun after every change. Unfortunately there's no autocmd for "file changed", but there is the CursorHold autocmd, which runs when you've been idle for a configurable duration.
One possible solution can be found here. Not the best way because it highlights every occurrence in the whole file and you have to give the command every time (probably the second one can be avoided, don't know about the first). Give it a look though.
I'm writing a C/C++/... build system (I understand this is madness ;)), and I'm having trouble designing my parser.
My "recipes" look like this:
global
{
SOURCE_DIRS src
HEADER_DIRS include
SOURCES bitwise.c \
framing.c
HEADERS \
ogg/os_types.h \
ogg/ogg.h
}
lib static ogg_static
{
NAME ogg
}
lib shared ogg_shared
{
NAME ogg
}
(This being based on the super simple libogg source tree)
# are comments, \ are "newline escapes", meaning the line continues on the next line (see QMake syntac). {} are scopes, like in C++, and global are settings that apply to every "target". This is all background, and not that relevant... I really don't know how to work with my scopes. I will need to be able to have multiple scopes, and also a form of conditional processing, in the lines of:
win32:DEFINES NO_CRT_SECURE_DEPRECATE
The parsing function will need to know on what level of scope it's at, and call itself whenever the scope is increased. There is also the problem with the location of the braces ( global { or global{ or as in the example).
How could I go about this, using Standard C++ and STL? I understand this is a whole lot of work, and that's exactly why I need a good starting point. Thanks!
What I have already is the whole ifstream and internal string/stringstream storage, so I can read word per word.
I would suggest (and this is more or less right out of the compiler textbooks) that you approach the problem in phases. This breaks things down so that the problem is much more manageable in each phase.
Focus first on the lexer phase. Your lexing phase should take the raw text and give you a sequence of tokens, such as words and special characters. The lexer phase can take care of line continuations, and handle whitespace or comments as appropriate. By handling whitespace, the lexer can simplify your parser's task: you can write the lexer so that global{, global {, and even
global
{
will all yield two tokens: one representing global and one representing {.
Also note that the lexer can tack line and column numbers onto the tokens for use later if you hit errors.
Once you've got a nice stream of tokens flowing, work on your parsing phase. The parser should take that sequence of tokens and build an abstract syntax tree, which models the syntactic structures of your document. At this point, you shouldn't be worrying about ifstream and operator>>, since the lexer should have done all that reading for you.
You've indicated an interest in calling the parsing function recursively once you see a scope. That's certainly one way to go. As you'll see, the design decision you'll have to repeatedly make is whether you literally want to call the same parse function recursively
(allowing for constructions like global { global { ... } } which you may want to disallow syntactically), or whether you want to define a slightly (or even significantly) different set of syntax rules that apply inside a scope.
Once you find yourself having to vary the rules: the key is to reuse, by refactoring into functions, as much stuff as you can reuse between the different variants of syntax. If you keep heading in this direction – using separate functions that represent the different chunks of syntax you want to deal with and having them call each other (possibly recursively) where needed – you'll ultimately end up with what we call a recursive descent parser. The Wikipedia entry has got a good simple example of one; see http://en.wikipedia.org/wiki/Recursive_descent_parser .
If you find yourself really wanting to delve deeper into the theory and practice of lexers and parsers, I do recommend you get a good solid compiler textbook to help you out. The Stack Overflow topic mentioned in the comments above will get you started: Learning to write a compiler
boost::spirit is a good recursive descent parser generator that uses C++ templates as a language extension to describe parser and lexer. It works well for native C++ compilers, but won't compile under Managed C++.
Codeproject has a tutorial article that may help.
ANTLR (use ANTLRWorks), after that you can look for FLEX/BISON and others like lemon. There are many tools out there but ANTLR and flex/bison should be enough. I personally like ANTLRWorks too much to recommend something else.
LATER: With ANTLR you can generate parser/lexer code for a variety of languages.
Unless the point of the project is specifically learning how to write a lexer and shift-reduce parser, I'd recommending using Flex and Bison, which will handle much of the parsing grunt-work for you. Writing the grammar and semantic analysis will still be a whole lot of work, don't worry ;)