What's the purpose of #_ in Clojure? - clojure

I was going through a library code in which they used #_. As gone through multiple references, I understand #_ is discard symbol, which tell the reader to ignore whatever comes next.
Why it is even needed in the first place? If we have something to be ignored, why can't we remove it or just comment it? What's the significance of #_ over commenting?

It's super handy when debugging or writing some altered code.
Say you have some massive function, and you want to comment it out for a bit. You have a few options:
You can use line comments:
; (defn some-huge-thing []
; ... Many lines)
But that's painful unless your IDE has an commenting shortcut, and even then it takes some work. Plus, I've found most IDE's handling of comment-shortcuts to work less than ideally. Sometimes they just add another "layer" of comments instead of removing the existing ones. Line comments also don't help if you only want to comment out a small piece of a function since they aren't context sensitive.
You could use comment:
(comment
(defn some-huge-thing []
... Many lines))
But I personally don't like comment because here, it requires either nesting the entire thing, or violating Parinfer just to add the comment. Also as #amalloy points out, it ends up expanding to nil, so it can only be used in a context where a stray nil won't effect anything.
... Or, you can use #_:
#_
(defn some-huge-thing []
... Many lines)
It doesn't require altering the function at all. It's just two keystrokes to put in, and two to remove. It also doesn't evaluate to nil; it's simply ignored. That means you can use it, for example, to comment out arguments inside of a function call.
I personally use #_ quite often when playing around with different implementations and when I'm trying to isolate a bug. It causes anything comming immediately after it to be ignored, so it's very easy to control what is and isn't executing.

Related

What is the difference between #_ and ; in Clojure?

I know that we use ; for commenting in Clojure, which is equivalent of // in Java. But I cannot understand why we need #_
Can someone please explain this? Is it used to ignore text? How is it different from ; if that is the case?
; is a line comment; it ignores the text from ; to the end of the line in your source. This also means, that sometimes a comment on the last line leads to lines with just closing parens, which usually is avoided. The main usecase for ; is to write comments for humans to read.
#_ is called
discard; it
ignores the next form, which means it is agnostic of line breaks in your
source file. This is primarily used to quickly "toggle" code. You can also stack #_. E.g. {#_#_ :a 42}.
Note, that there is also the (comment ...) macro, which throws away the body and returns nil; so the body must be "valid" and you are well advised not use it, where the result could cause havoc. This is primarily used to provide some "prove of concept" inside the source or code you can quickly run in the REPL, but should not run as part of the ns, or when you are to lazy to write a proper test.

clojure code modification preserving reader macros

In clojure, read-string followed by str will not return the original string, but the string for which the reader macros have been expanded:
(str (read-string "(def foo [] #(bar))"))
;"(def foo [] (fn* [] (bar)))"
This is problematic if we want to manipulate a small part of the code, far away from any reader macros, and get back a string representation that preserves the reader macros. Is there a work around?
The purpose of read is to build an AST of the code and as such the function does not preserve all the properties of the original text. Otherwise, it should keep track of the original code-layout (e.g. location of parenthesis, newlines, indentation (tabs/spaces), comments, and so on). If you have a look at LispReader.java, you can see that the reader macros are unconditionally applied (*read-eval* does not influence all reader macros).
Here is what I would recommend:
You can take inspiration of the existing LispReader and implement your own reader. Maybe it is sufficient to change the dispatch-table so that macro characters are directed to you own reader. You would also need to build runtime representations of the quoted forms and provide adequate printer functions for those objects.
You can process your original file with Emacs lisp, which can easily navigate the structure of your code and edit it as you wish.
Remark: you must know that what you are trying to achieve smells fishy. You might have a good reason to want to do this, but this looks needlessly complex without knowing why you want to work at the syntax level. It would help if you could provide more details.

can defmacro quota parameter in clojure?

I have a string which will evaluate to true or false, can I use macro and pass the string as parameter? I write the following, but the result is string of (= 0 0) instead of true. How to get true result?
(def a "(= 0 0)")
(defmacro test [code-string] code-string)
(test a)
update:
The purpose is replace dynamic SQL. Currently we store code like 'column_a > 1' in database, and then we will get the code, and assemble a sql like
select case when column_a>1 then 0 else 1 end as result from table
There are many such code, and I hope to use clojure run in parallel to speed it up. To use clojure I could store '(> row["column_a"] 1)' in database, and then in jdbc looping, call (> row["column_a"] 1) to do my logic, like storing some code section in database and need to run it.
As TaylanUB already said, Clojure provides eval to evaluate some expression at run-time. However, using eval is frowned upon unless you have very good reasons to use it. It's not clear what you're really intending to do, so it would be helpful to provide a more real world example. If you don't have one, you don't need eval.
Similarly, macros are used to transform code and are not run at run-time, instead the code to which the macro evaluates gets run. The typical approach would be to try to solve a problem with a mere function, only if a macro would buy you something in terms of applicability to a wider range of code, consider turning the code into a macro. Edit: take a look at some introduction to macros in Clojure, e.g. this part from Clojure from the ground up
No, you cannot directly use a string as code. Defmacro takes s-expressions not strings. Clojure might have something like read which can parse a string and make an s-expression out of it which you might then be able to execute as code via something like eval.
There is usually no good reason to put code in strings or other data structures which will exist during program execution anyway, try to just work with first-class functions instead. Or mention the precise problem you're trying to solve and people might be able to give better answers. This might be an instance of the XY problem.
Note: I don't know Clojure, but all of this is pretty Lisp-generic.
(defn eval-code [code-string]
(eval (read-string code-string)))
(eval-code "(= 0 0)")
;; you don't need macro here.

How can I avoid using the stack with continuation-passing style?

For my diploma thesis I chose to implement the task of the ICFP 2004 contest.
The task--as I translated it to myself--is to write a compiler which translates a high-level ant-language into a low-level ant-assembly. In my case this means using a DSL written in Clojure (a Lisp dialect) as the high-level ant-language to produce ant-assembly.
UPDATE:
The ant-assembly has several restrictions: there are no assembly-instructions for calling functions (that is, I can't write CALL function1, param1), nor returning from functions, nor pushing return addresses onto a stack. Also, there is no stack at all (for passing parameters), nor any heap, or any kind of memory. The only thing I have is a GOTO/JUMP instruction.
Actually, the ant-assembly is for to describe the transitions of a state machine (=the ants' "brain"). For "function calls" (=state transitions) all I have is a JUMP/GOTO.
While not having anything like a stack, heap or a proper CALL instruction, I still would like to be able to call functions in the ant-assembly (by JUMPing to certain labels).
At several places I read that transforming my Clojure DSL function calls into continuation-passing style (CPS) I can avoid using the stack[1], and I can translate my ant-assembly function calls into plain JUMPs (or GOTOs). Which is exactly what I need, because in the ant-assembly I have no stack at all, only a GOTO instruction.
My problem is that after an ant-assembly function has finished, I have no way to tell the interpreter (which interprets the ant-assembly instructions) where to continue. Maybe an example helps:
The high-level Clojure DSL:
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(pickup-food ; true branch, food was found
(go-home ; ***
(drop-food
(search-for-food cont))))
(move ; false branch, continue searching
(search-for-food cont))))
(defn run-away-from-enemy [cont]
(sense-enemy-here? ; a conditional w/ 2 branches
(go-home ; ***
(call-help-from-others cont))
(search-for-food cont)))
(defn go-home [cont]
(turn-backwards
; don't bother that this "while" is not in CPS now
(while (not (sense-home-here?))
(move)))
(cont))
The ant-assembly I'd like to produce from the go-home function is:
FUNCTION-GO-HOME:
turn left nextline
turn left nextline
turn left nextline ; now we turned backwards
SENSE-HOME:
sense here home WE-ARE-AT-HOME CONTINUE-MOVING
CONTINUE-MOVING:
move SENSE-HOME
WE-ARE-AT-HOME:
JUMP ???
FUNCTION-DROP-FOOD:
...
FUNCTION-CALL-HELP-FROM-OTHERS:
...
The syntax for the ant-asm instructions above:
turn direction which-line-to-jump
sense direction what jump-if-true jump-if-false
move which-line-to-jump
My problem is that I fail to find out what to write to the last line in the assembly (JUMP ???). Because--as you can see in the example--go-home can be invoked with two different continuations:
(go-home
(drop-food))
and
(go-home
(call-help-from-others))
After go-home has finished I'd like to call either drop-food or call-help-from-others. In assembly: after I arrived at home (=the WE-ARE-AT-HOME label) I'd like to jump either to the label FUNCTION-DROP-FOOD or to the FUNCTION-CALL-HELP-FROM-OTHERS.
How could I do that without a stack, without PUSHing the address of the next instruction (=FUNCTION-DROP-FOOD / FUNCTION-CALL-HELP-FROM-OTHERS) to the stack? My problem is that I don't understand how continuation-passing style (=no stack, only a GOTO/JUMP) could help me solving this problem.
(I can try to explain this again if the things above are incomprehensible.)
And huge thanks in advance for your help!
--
[1] "interpreting it requires no control stack or other unbounded temporary storage". Steele: Rabbit: a compiler for Scheme.
Yes, you've provided the precise motivation for continuation-passing style.
It looks like you've partially translated your code into continuation-passing-style, but not completely.
I would advise you to take a look at PLAI, but I can show you a bit of how your function would be transformed, assuming I can guess at clojure syntax, and mix in scheme's lambda.
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(search-for-food
(lambda (r)
(drop-food r
(lambda (s)
(go-home s cont)))))
(search-for-food
(lambda (r)
(move r cont)))))
I'm a bit confused by the fact that you're searching for food whether or not you sense food here, and I find myself suspicious that either this is weird half-translated code, or just doesn't mean exactly what you think it means.
Hope this helps!
And really: go take a look at PLAI. The CPS transform is covered in good detail there, though there's a bunch of stuff for you to read first.
Your ant assembly language is not even Turing-complete. You said it has no memory, so how are you supposed to allocate the environments for your function calls? You can at most get it to accept regular languages and simulate finite automata: anything more complex requires memory. To be Turing-complete you'll need what amounts to a garbage-collected heap. To do everything you need to do to evaluate CPS terms you'll also need an indirect GOTO primitive. Function calls in CPS are basically (possibly indirect) GOTOs that provide parameter passing, and the parameters you pass require memory.
Clearly, your two basic options are to inline everything, with no "external" procedures (for extra credit look up the original meaning of "internal" and "external" here), or somehow "remember" where you need to go on "return" from a procedure "call" (where the return point does not necessarily need to fall in the physical locations immediately following the "calling" point). Basically, the return point identifier can be a code address, an index into a branch table, or even a character symbol -- it just needs to identify the return target relative to the called procedure.
The most obvious here would be to track, in your compiler, all of the return targets for a given call target, then, at the end of the called procedure, build a branch table (or branch ladder) to select from one of the several possible return targets. (In most cases there are only a handful of possible return targets, though for commonly used procedures there could be hundreds or thousands.) Then, at the call point, the caller needs to load a parameter with the index of its return point relative to the called procedure.
Obviously, if the callee in turn calls another procedure, the first return point identifier must be preserved somehow.
Continuation passing is, after all, just a more generalized form of a return address.
You might be interested in Andrew Appel's book Compiling with Continuations.

Is returning early from a function more elegant than an if statement?

Myself and a colleague have a dispute about which of the following is more elegant. I won't say who's who, so it is impartial. Which is more elegant?
public function set hitZone(target:DisplayObject):void
{
if(_hitZone != target)
{
_hitZone.removeEventListener(MouseEvent.ROLL_OVER, onBtOver);
_hitZone.removeEventListener(MouseEvent.ROLL_OUT, onBtOut);
_hitZone.removeEventListener(MouseEvent.MOUSE_DOWN, onBtDown);
_hitZone = target;
_hitZone.addEventListener(MouseEvent.ROLL_OVER, onBtOver, false, 0, true);
_hitZone.addEventListener(MouseEvent.ROLL_OUT, onBtOut, false, 0, true);
_hitZone.addEventListener(MouseEvent.MOUSE_DOWN, onBtDown, false, 0, true);
}
}
...or...
public function set hitZone(target:DisplayObject):void
{
if(_hitZone == target)return;
_hitZone.removeEventListener(MouseEvent.ROLL_OVER, onBtOver);
_hitZone.removeEventListener(MouseEvent.ROLL_OUT, onBtOut);
_hitZone.removeEventListener(MouseEvent.MOUSE_DOWN, onBtDown);
_hitZone = target;
_hitZone.addEventListener(MouseEvent.ROLL_OVER, onBtOver, false, 0, true);
_hitZone.addEventListener(MouseEvent.ROLL_OUT, onBtOut, false, 0, true);
_hitZone.addEventListener(MouseEvent.MOUSE_DOWN, onBtDown, false, 0, true);
}
In most cases, returning early reduces the complexity and makes the code more readable.
It's also one of the techniques applied in Spartan programming:
Minimal use of Control
Minimizing the use of conditionals by using specialized
constructs such ternarization,
inheritance, and classes such as Class
Defaults, Class Once and Class
Separator
Simplifying conditionals with early return.
Minimizing the use of looping constructs, by using action applicator
classes such as Class Separate and
Class FileSystemVisitor.
Simplifying logic of iteration with early exits (via return,
continue and break statements).
In your example, I would choose option 2, as it makes the code more readable. I use the same technique when checking function parameters.
This is one of those cases where it's ok to break the rules (i.e. best practices). In general you want to have as few return points in a function as possible. The practical reason for this is that it simplifies your reading of the code, since you can just always assume that each and every function will take its arguments, do its logic, and return its result. Putting in extra returns for various cases tends to complicate the logic and increase the amount of time necessary to read and fully grok the code. Once your code reaches the maintenance stage then multiple returns can have a huge impact on the productivity of new programmers as they try to decipher the logic (its especially bad when comments are sparse and the code unclear). The problem grows exponentially with respect to the length of the function.
So then why in this case does everyone prefer option 2? It's because you're are setting up a contract that the function enforces through validating incoming data, or other invariants that might need to be checked. The prettiest syntax for constructing the validation is the check each condition, returning immediately if the condition fails validity. That way you don't have to maintain some kind of isValid boolean through all of your checks.
To sum things up: we're really looking at how to write validation code and not general logic; option 2 is better for validation code.
As long as the early returns are organized as a block at the top of the function/method body, then I think they're much more readable than adding another layer of nesting.
I try to avoid early returns in the middle of the body. Sometimes they're the best way, but most of the time I think they complicate.
Also, as a general rule I try to minimize nesting control structures. Obviously you can take this one too far, so you have to use some discretion. Converting nested if's to a single switch/case is much clearer to me, even if the predicates repeat some sub-expressions (and assuming this isn't a performance critical loop in a language too dumb to do subexpression elimination). Particularly I dislike the combination of nested ifs in long function/method bodies, since if you jump into the middle of the code for some reason you end up scrolling up and down to mentally reconstruct the context of a given line.
In my experience, the issue with using early returns in a project is that if others on the project aren't used to them, they won't look for them. So early returns or not - if there are multiple programmers involved, make sure everyone's at least aware of their presence.
I personally write code to return as soon as it can, as delaying a return often introduces extra complexity eg trying to safely exit a bunch of nested loops and conditions.
So when I look at an unfamiliar function, the very first thing I do is look for all the returns. What really helps there is to set up your syntax colouring to give return a different colour from anything else. (I go for red.) That way, the returns become a useful tool for determining what the function does, rather than hidden stumbling blocks for the unwary.
Ah the guardian.
Imho, yes - the logic of it is clearer because the return is explicit and right next to the condition, and it can be nicely grouped with similar structures. This is even more applicable where "return" is replaced with "throw new Exception".
As said before, early return is more readable, specially if the body of a function is long, you may find that deleting a } by mistake in a 3 page function (wich in itself is not very elegant) and trying to compile it can take several minutes of non-automatable debugging.
It also makes the code more declarative, because that's the way you would describe it to another human, so probably a developer is close enough to one to understand it.
If the complexity of the function increases later, and you have good tests, you can simply wrap each alternative in a new function, and call them in case branches, that way you mantain the declarative style.
In this case (one test, no else clause) I like the test-and-return. It makes it clear that in that case, there's nothing to do, without having to read the rest of the function.
However, this is splitting the finest of hairs. I'm sure you must have bigger issues to worry about :)
option 2 is more readable, but the manageability of the code fails when a else may be required to be added.
So if you are sure, there is no else go for option 2, but if there could be scope for an else condition then i would prefer option 1
Option 1 is better, because you should have a minimal number of return points in procedure.
There are exceptions like
if (a) {
return x;
}
return y;
because of the way a language works, but in general it's better to have as few exit points as it is feasible.
I prefer to avoid an immediate return at the beginning of a function, and whenever possible put the qualifying logic to prevent entry to the method prior to it being called. Of course, this varies depending on the purpose of the method.
However, I do not mind returning in the middle of the method, provided the method is short and readable. In the event that the method is large, in my opinion, it is already not very readable, so it will either be refactored into multiple functions with inline returns, or I will explicitly break from the control structure with a single return at the end.
I am tempted to close it as exact duplicate, as I saw some similar threads already, including Invert “if” statement to reduce nesting which has good answers.
I will let it live for now... ^_^
To make that an answer, I am a believer that early return as guard clause is better than deeply nested ifs.
I have seen both types of codes and I prefer first one as it is looks easily readable and understandable for me but I have read many places that early exist is the better way to go.
There's at least one other alternative. Separate the details of the actual work from the decision about whether to perform the work. Something like the following:
public function setHitZone(target:DisplayObject):void
{
if(_hitZone != target)
setHitZoneUnconditionally(target);
}
public function setHitZoneUnconditionally(target:DisplayObject):void
{
_hitZone.removeEventListener(MouseEvent.ROLL_OVER, onBtOver);
_hitZone.removeEventListener(MouseEvent.ROLL_OUT, onBtOut);
_hitZone.removeEventListener(MouseEvent.MOUSE_DOWN, onBtDown);
_hitZone = target;
_hitZone.addEventListener(MouseEvent.ROLL_OVER, onBtOver, false, 0, true);
_hitZone.addEventListener(MouseEvent.ROLL_OUT, onBtOut, false, 0, true);
_hitZone.addEventListener(MouseEvent.MOUSE_DOWN, onBtDown, false, 0, true);
}
Any of these three (your two plus the third above) are reasonable for cases as small as this. However, it would be A Bad Thing to have a function hundreds of lines long with multiple "bail-out points" sprinkled throughout.
I've had this debate with my own code over the years. I started life favoring one return and slowly have lapsed.
In this case, I prefer option 2 (one return) simply because we're only talking about 7 lines of code wrapped by an if() with no other complexity. It's far more readable and function-like. It flows top to bottom. You know you start at the top and end at the bottom.
That being said, as others have said, if there were more guards at the beginning or more complexity or if the function grows, then I would prefer option 1: return immediately at the beginning for a simple validation.