How do you find all the free variables in a Clojure expression? - clojure

How do you find all the free variables in a Clojure expression?
By free variables, I mean all the symbols that are not defined in the local environment (including all local environments defined inside the expression), not defined in the global environment (all symbols in the current namespace, including all those imported from other namespaces), and not a Clojure primitive.
For example, there is one free variable in this expression:
(fn [ub]
(* (rand-int ub) scaling-factor))
and that is scaling-factor. fn, *, and rand-int are all defined in the global environment. ub is defined in the scope that it occurs within, so it's a bound variable, too (i.e. not free).
I suppose I could write this myself—it doesn't look too hard—but I'm hoping that there's some standard way to do it that I should use, or a standard way to access the Clojure compiler to do this (since the Clojure compiler surely has to do this, too). One potential pitfall for a naïve implementation is that all macros within the expression must be fully expanded, since macros can introduce new free variables.

You can analyze the form with tools.analyzer.jvm passing the specific callback to handle symbols that cannot be resolved.
Something like this:
(require '[clojure.tools.analyzer.jvm :as ana.jvm])
(def free-variables (atom #{}))
(defn save-and-replace-with-nil [_ s _]
(swap! free-variables conj s)
;; replacing unresolved symbol with `nil`
;; in order to keep AST valid
{:op :const
:env {}
:type :nil
:literal? true
:val nil
:form nil
:top-level true
:o-tag nil
:tag nil})
(ana.jvm/analyze
'(fn [ub]
(* (rand-int ub) scaling-factor))
(ana.jvm/empty-env)
{:passes-opts
(assoc ana.jvm/default-passes-opts
:validate/unresolvable-symbol-handler save-and-replace-with-nil)})
(println #free-variables) ;; => #{scaling-factor}
It will also handle macroexpansion correctly:
(defmacro produce-free-var []
`(do unresolved))
(ana.jvm/analyze
'(let [x :foo] (produce-free-var))
(ana.jvm/empty-env)
{:passes-opts
(assoc ana.jvm/default-passes-opts
:validate/unresolvable-symbol-handler save-and-replace-with-nil)})
(println #free-variables) ;; => #{scaling-factor unresolved}

Related

Using let style destructuring for def

Is there a reasonable way to have multiple def statements happen with destructing the same way that let does it? For Example:
(let [[rtgs pcts] (->> (sort-by second row)
(apply map vector))]
.....)
What I want is something like:
(defs [rtgs pcts] (->> (sort-by second row)
(apply map vector)))
This comes up a lot in the REPL, notebooks and when debugging. Seriously feels like a missing feature so I'd like guidance on one of:
This exists already and I'm missing it
This is a bad idea because... (variable capture?, un-idiomatic?, Rich said so?)
It's just un-needed and I must be suffering from withdrawals from an evil language. (same as: don't mess up our language with your macros)
A super short experiment give me something like:
(defmacro def2 [[name1 name2] form]
`(let [[ret1# ret2#] ~form]
(do (def ~name1 ret1#)
(def ~name2 ret2#))))
And this works as in:
(def2 [three five] ((juxt dec inc) 4))
three ;; => 3
five ;; => 5
Of course and "industrial strength" version of that macro might be:
checking that number of names matches the number of inputs. (return from form)
recursive call to handle more names (can I do that in a macro like this?)
While I agree with Josh that you probably shouldn't have this running in production, I don't see any harm in having it as a convenience at the repl (in fact I think I'll copy this into my debug-repl kitchen-sink library).
I enjoy writing macros (although they're usually not needed) so I whipped up an implementation. It accepts any binding form, like in let.
(I wrote this specs-first, but if you're on clojure < 1.9.0-alpha17, you can just remove the spec stuff and it'll work the same.)
(ns macro-fun
(:require
[clojure.spec.alpha :as s]
[clojure.core.specs.alpha :as core-specs]))
(s/fdef syms-in-binding
:args (s/cat :b ::core-specs/binding-form)
:ret (s/coll-of simple-symbol? :kind vector?))
(defn syms-in-binding
"Returns a vector of all symbols in a binding form."
[b]
(letfn [(step [acc coll]
(reduce (fn [acc x]
(cond (coll? x) (step acc x)
(symbol? x) (conj acc x)
:else acc))
acc, coll))]
(if (symbol? b) [b] (step [] b))))
(s/fdef defs
:args (s/cat :binding ::core-specs/binding-form, :body any?))
(defmacro defs
"Like def, but can take a binding form instead of a symbol to
destructure the results of the body.
Doesn't support docstrings or other metadata."
[binding body]
`(let [~binding ~body]
~#(for [sym (syms-in-binding binding)]
`(def ~sym ~sym))))
;; Usage
(defs {:keys [foo bar]} {:foo 42 :bar 36})
foo ;=> 42
bar ;=> 36
(defs [a b [c d]] [1 2 [3 4]])
[a b c d] ;=> [1 2 3 4]
(defs baz 42)
baz ;=> 42
About your REPL-driven development comment:
I don't have any experience with Ipython, but I'll give a brief explanation of my REPL workflow and you can maybe comment about any comparisons/contrasts with Ipython.
I never use my repl like a terminal, inputting a command and waiting for a reply. My editor supports (emacs, but any clojure editor should do) putting the cursor at the end of any s-expression and sending that to the repl, "printing" the result after the cursor.
I usually have a comment block in the file where I start working, just typing whatever and evaluating it. Then, when I'm reasonably happy with a result, I pull it out of the "repl-area" and into the "real-code".
(ns stuff.core)
;; Real code is here.
;; I make sure that this part always basically works,
;; ie. doesn't blow up when I evaluate the whole file
(defn foo-fn [x]
,,,)
(comment
;; Random experiments.
;; I usually delete this when I'm done with a coding session,
;; but I copy some forms into tests.
;; Sometimes I leave it for posterity though,
;; if I think it explains something well.
(def some-data [,,,])
;; Trying out foo-fn, maybe copy this into a test when I'm done.
(foo-fn some-data)
;; Half-finished other stuff.
(defn bar-fn [x] ,,,)
(keys 42) ; I wonder what happens if...
)
You can see an example of this in the clojure core source code.
The number of defs that any piece of clojure will have will vary per project, but I'd say that in general, defs are not often the result of some computation, let alone the result of a computation that needs to be destructured. More often defs are the starting point for some later computation that will depend on this value.
Usually functions are better for computing a value; and if the computation is expensive, then you can memoize the function. If you feel you really need this functionality, then by all means, use your macro -- that's one of the sellings points of clojure, namely, extensibility! But in general, if you feel you need this construct, consider the possibility that you're relying too much on global state.
Just to give some real examples, I just referenced my main project at work, which is probably 2K-3K lines of clojure, in about 20 namespaces. We have about 20 defs, most of which are marked private and among them, none are actually computing anything. We have things like:
(def path-prefix "/some-path")
(def zk-conn (atom nil))
(def success? #{200})
(def compile* (clojure.core.memoize/ttl compiler {} ...)))
(def ^:private nashorn-factory (NashornScriptEngineFactory.))
(def ^:private read-json (comp json/read-str ... ))
Defining functions (using comp and memoize), enumerations, state via atom -- but no real computation.
So I'd say, based on your bullet points above, this falls somewhere between 2 and 3: it's definitely not a common use case that's needed (you're the first person I've ever heard who wants this, so it's uncommon to me anyway); and the reason it's uncommon is because of what I said above, i.e., it may be a code smell that indicates reliance on too much global state, and hence, would not be very idiomatic.
One litmus test I have for much of my code is: if I pull this function out of this namespace and paste it into another, does it still work? Removing dependencies on external vars allows for easier testing and more modular code. Sometimes we need it though, so see what your requirements are and proceed accordingly. Best of luck!

Calling Clojure functions using var-quote syntax

Occasionally when looking at other people's Clojure code, I see a function defined via defn and then called using the var-quote syntax, e.g.:
user> (defn a [] 1)
#'user/a
user> (a) ; This is how you normally call a function
1
user> (#'a) ; This uses the var-quote syntax and produces the same result
1
For the life of me I can't figure out the difference between these two ways of calling a function. I can't find anything in the evaluation documentation to say what happens when the operator of a call is a var that might suggest why the second form would be preferred. They both seem to respond in the same to binding assignments and syntax-quoting.
So, can somebody please provide a code sample that will illustrate the difference between (a) and (#'a) above?
Edit: I know that var-quote can be used to get to a var that's shadowed by a let lexical binding, but that doesn't seem to be the case in the code that I'm looking at.
(#'a) always refers to the var a, while (a) can be shadowed by local bindings:
user> (defn a [] 1)
#'user/a
user> (let [a (fn [] "booh")] [(a) (#'a)])
["booh" 1]
But most actual uses of var-quote / function call are not calling the var-quote expression directly, but instead cache its value so that higher-order constructs refer to the current value of var a instead of its value when passed in:
(defn a [] 1)
(defn my-call [f] (fn [] (+ 1 (f))))
(def one (my-call a))
(def two (my-call #'a))
(defn a [] 2)
user> (one)
2
user> (two)
3
This is mostly useful for interactive development, where you're changing some function that gets wrapped in a bunch of other functions in other packages.
The second form allows you to circumvent the privacy restrictions that clojure puts in place.
So, for instance, if you develop a library with private functions, but want to test them from a separate namespace, you cannot refer to them directly. But you can get to them using the var quote syntax. It's very useful for this.
Privacy is clojure is, in essence, a form of automatic documentation, as opposed to the privacy you see in Java. You can get around it.
user> (defn- a [] 1)
#'user/a
user> (ns user2)
nil
user2> (user/a)
CompilerException java.lang.IllegalStateException: var: #'user/a is not public, compiling:(NO_SOURCE_PATH:1)
user2> (#'user/a)
1
user2>

How does ClojureScript compile closures?

When using ClojureScript I tried to define a function that is a closure over a variable like this:
(let [x 42]
(defn foo [n] (+ x n)))
That prints the following source at the Rhino REPL:
function foo(n){
return cljs.core._PLUS_.call(null,x__43,n);
}
The function works as I expect but when trying to get at the variable named x__43 I can't get it. Where did it go?
the x variable is defined outside the foo function, in the let binding. you can't "get it" because you're not in the scope of the let binding. that's more or less the whole point of using closures.
conceptually, let bindings are implemented as function calls:
(let [x 2] ...)
is equivalent to
((fn [x] ...) 2)
which is probably similar to let is implemented in ClojureScript - either as a macro transformation to fn or directly to (function(x){...})(2).
(let [x 42]
(defn foo [n] (+ x n)))
is currently compiled to
var x__1311 = 42;
cljs.user.foo = (function foo(n){
return (x__1311 + n);
});
The exact number attached to the x may of course vary from compilation to compilation and cljs.user would be replaced by the appropriate namespace name.
There is no attempt to conceal the generated variable from unrelated code in a JavaScript closure, so in principle it could still be modified if one went out of one's way to do so. Accidental collisions are extremely unlikely and simply will not happen with regular ClojureScript.
To discover things like the above, you can either call the compiler with {:optimizations :simple :pretty-print true} among the options or ask it to emit some JavaScript at the REPL (as provided by script/repl in the ClojureScript source tree or lein repl in a Leiningen project with ClojureScript declared as a dependency):
(require '[cljs.compiler :as comp])
(binding [comp/*cljs-ns* 'cljs.user]
(comp/emit
(comp/analyze {:ns {:name 'cljs.user} :context :statement :locals {}}
'(let [x 42] (defn foo [n] (+ x n))))))

Which Vars affect a Clojure function?

How do I programmatically figure out which Vars may affect the results of a function defined in Clojure?
Consider this definition of a Clojure function:
(def ^:dynamic *increment* 3)
(defn f [x]
(+ x *increment*))
This is a function of x, but also of *increment* (and also of clojure.core/+(1); but I'm less concerned with that). When writing tests for this function, I want to make sure that I control all relevant inputs, so I do something like this:
(assert (= (binding [*increment* 3] (f 1)) 4))
(assert (= (binding [*increment* -1] (f 1)) 0))
(Imagine that *increment* is a configuration value that someone might reasonably change; I don't want this function's tests to need changing when this happens.)
My question is: how do I write an assertion that the value of (f 1) can depend on *increment* but not on any other Var? Because I expect that one day someone will refactor some code and cause the function to be
(defn f [x]
(+ x *increment* *additional-increment*))
and neglect to update the test, and I would like to have the test fail even if *additional-increment* is zero.
This is of course a simplified example – in a large system, there can be lots of dynamic Vars, and they can get referenced through a long chain of function calls. The solution needs to work even if f calls g which calls h which references a Var. It would be great if it didn't claim that (with-out-str (prn "foo")) depends on *out*, but this is less important. If the code being analyzed calls eval or uses Java interop, of course all bets are off.
I can think of three categories of solutions:
Get the information from the compiler
I imagine the compiler does scan function definitions for the necessary information, because if I try to refer to a nonexistent Var, it throws:
user=> (defn g [x] (if true x (+ *foobar* x)))
CompilerException java.lang.RuntimeException: Unable to resolve symbol: *foobar* in this context, compiling:(NO_SOURCE_PATH:24)
Note that this happens at compile time, and regardless of whether the offending code will ever be executed. Thus the compiler should know what Vars are potentially referenced by the function, and I would like to have access to that information.
Parse the source code and walk the syntax tree, and record when a Var is referenced
Because code is data and all that. I suppose this means calling macroexpand and handling each Clojure primitive and every kind of syntax they take. This looks so much like a compilation phase that it would be great to be able to call parts of the compiler, or somehow add my own hooks to the compiler.
Instrument the Var mechanism, execute the test and see which Vars get accessed
Not as complete as the other methods (what if a Var is used in a branch of the code that my test fails to exercise?) but this would suffice. I imagine I would need to redefine def to produce something that acts like a Var but records its accesses somehow.
(1) Actually that particular function doesn't change if you rebind +; but in Clojure 1.2 you can bypass that optimization by making it (defn f [x] (+ x 0 *increment*)) and then you can have fun with (binding [+ -] (f 3)). In Clojure 1.3 attempting to rebind + throws an error.
Regarding your first point you could consider using the analyze library. With it you can quite easily figure out which dynamic vars are used in an expression:
user> (def ^:dynamic *increment* 3)
user> (def src '(defn f [x]
(+ x *increment*)))
user> (def env {:ns {:name 'user} :context :eval})
user> (->> (analyze-one env src)
expr-seq
(filter (op= :var))
(map :var)
(filter (comp :dynamic meta))
set)
#{#'user/*increment*}
I know that this doesn't answer your question, but wouldn't it be a lot less work to just provide two versions of a function where one version has no free variables, and the other version calls the first one with the appropriate top-level defines?
For example:
(def ^:dynamic *increment* 3)
(defn f
([x]
(f x *increment*))
([x y]
(+ x y)))
This way you can write all your tests against (f x y), which doesn't rely on any global state.

How To Make Determinations at Clojure Macro Expansion Time

I believe I understand that during macro expansion, a macro does not have access to the things a function has access to, because expansion happens before compile time.
But, I am having trouble understanding is how to perform checks at macro expansion time.
For example:
(defn gen-class [cl-nam]
(fn [cmd & args]
(condp = cmd
:name (name cl-nam))))
(defmacro defnclass [cl-nam]
`(def ~cl-nam (gen-class '~cl-nam)))
I would like to check to see that cl-nam is not a sequence. I would like to use count and find out of its length is > 1.
I understand I can unquote the println in the following macro, so that I can get an expansion-time message.
(defmacro defnclass_info [cl-nam]
`(do
~(println cl-nam)
(def ~cl-nam (gen-class '~cl-nam))))
But, I am not sure how to go about checking to see what was passed for cl-nam.
I'm reading a lot of Clojure macro descriptions from several books, and am stumped.
Thanks.
A macro is really just a function.
(defmacro defnclass-info
[cl-name]
(when (seq? cl-name)
(throw (Exception. "cl-name must not be a sequence")))
`(def ~cl-name (gen-class '~cl-name)))
Edit: One could also switch behaviour and recursively call the macro again for the whole list. What you do, depends on your requirements.
(defmacro defnclass-info
[cl-name]
(if (seq? cl-name)
`(do ~#(for [cn cl-name] `(defnclass-info ~cn)))
`(def ~cl-name (gen-class '~cl-name))))
The macro expansion is just what the "function" returns. So you have full flexibility on what you do with the macro arguments.