Why does Clojure's gensym increase by three on each call? - clojure

Fairly new to lisps, but in looking into sequential integer generating code, I noticed that repeated calls to (gensym) would increase the number provided after the prefix by 3. I'm curious why that is the case.
user=> (gensym)
G__662
user=> (gensym)
G__665
user=> (gensym)
G__668
user=> (gensym)
G__671
user=> (gensym)
G__674
user=> (gensym)
G__677
I've seen and understand the combined use of atom and inc, but I'm new to the gensym function.

There are a number of correct answers here. One is: it doesn't!
user> (take 5 (repeatedly gensym))
(G__2173 G__2174 G__2175 G__2176 G__2177)
Another is: gensym doesn't make any guarantees as to the form of the symbols it generates, so you really shouldn't care whether they're sequential or not (or even if they contain numbers at all). You certainly shouldn't hijack gensym to produce an integer sequence.
Lastly: why does it increase by three in your example? Because each time you evaluate a form in the repl, the compiler has to create some gensyms of its own. Apparently, for the form (gensym), the number it needs to create is two.

It doesn't!
=> (str (gensym) (gensym))
"G__4027G__4028"
Looking at the source of gensym we can see that it uses clojure.lang.RT/nextID.
(defn gensym
([prefix-string] (. clojure.lang.Symbol (intern (str prefix-string (str (. clojure.lang.RT (nextID))))))))
The nextID function is also used in the LispReader. So when you repeatedly evaluate (gensym), the reader is probably using two IDs.
I clearly have something else going on in my process too, as if I wait any time between evaluations, more IDs are consumed and the gensym gaps further than just 3.
https://github.com/clojure/clojure/search?q=nextid

Related

how to spec a lazy-seq generating function?

I wish to use spec in my pre and post conditions of a generator function. A simplified example of what I wish to do is described below:
(defn positive-numbers
([]
{:post [(s/valid? (s/+ int?) %)]}
(positive-numbers 1))
([n]
{:post [(s/valid? (s/+ int?) %)]}
(lazy-seq (cons n (positive-numbers (inc n))))))
(->> (positive-numbers) (take 5))
However, defining the generator function like that seems to cause stack-overflow, the cause being that spec will eagerly try to evaluate the whole thing, -or something like that....
Is there another way of using spec to describe the :post result of a generator function like the one above (without causing stack-overflow)?
The theoretically correct answer is that in general you cannot check whether a lazy sequence matches a spec without realizing all of it.
In the case of your specific example of (s/+ int?), given a lazy sequence, how would one establish merely by observing the sequence whether all its elements are integers? However many elements you examine, the next one could always be a keyword.
This is the sort of thing that a type system like, say, core.typed may be able to prove, but a runtime-predicate-based assertion won't be able to check.
Now, in addition to s/+ and s/*, spec (as of Clojure 1.9.0-alpha14) also has a a combinator called s/every, whose docstring says this:
Note that 'every' does not do exhaustive checking, rather it samples *coll-check-limit* elements.
So we have e.g.
(s/valid? (s/* int?) (concat (range 1000) [:foo]))
;= false
but
(s/valid? (s/every int?) (concat (range 1000) [:foo]))
;= true
(with the default *coll-check-limit* value of 101).
This actually isn't an immediate fix to your example – plugging in s/every in place of s/+ won't work, because each recursive call will want to validate its own return value, which will involve realizing more of the sequence, which will involve more recursive calls etc. But you could factor out the sequence-building logic to a helper function with no postconditions and then have positive-numbers declare the postcondition and call that helper function:
(defn positive-numbers* [n]
(lazy-seq (cons n (positive-numbers* (inc n)))))
(defn positive-numbers [n]
{:post [(s/valid? (s/every int? :min-count 1) %)]}
(positive-numbers* n))
Note the caveats:
this will still realize a good chunk of your sequence, which may wreak havoc with your application's performance profile;
the only watertight guarantee here is that the prefix actually examined is as desired, if the seq has a weird item at position 123456, that will go unnoticed.
Because of (1), this is something that makes more sense as a test-only assertion. (2) may be acceptable – you'll still catch some silly typos and the documentation value of the spec is there anyway; if it isn't and you do want an absolutely watertight guarantee that your return type is as desired, then again, core.typed (perhaps used locally just for a handful of namespaces) may be the better bet.

How to pass a list to clojure's `->` macro?

I'm trying to find a way to thread a value through a list of functions.
Firstly, I had a usual ring-based code:
(defn make-handler [routes]
(-> routes
(wrap-json-body)
(wrap-cors)
;; and so on
))
But this was not optimal as I wanted to write a test to check the routes are actually wrapped with wrap-cors. I decided to extract the wrappers into a def. So the code became as follows:
(def middleware
(list ('wrap-json-body)
('wrap-cors)
;; and so on
))
(defn make-handler [routes]
(-> routes middleware))
This apparently doesn't work and is not supposed to as the -> macro doesn't take a list as the second argument. So I tried to use the apply function to resolve that:
(defn make-handler [routes]
(apply -> routes middleware))
Which eventually bailed out with:
CompilerException java.lang.RuntimeException: Can't take value of a
macro: #'clojure.core/->
So the question arises: How does one pass a list of values to the -> macro (or, say, any other macro) as one would do with apply for a function?
This is an XY Problem.
The main point of -> is to make code easier to read. But if one writes a new macro solely in order to use -> (in code nobody will ever see because it exists only at macro-expansion), it seems to me that this is doing a lot of work for no benefit. Moreover, I believe it obscures, rather than clarifies, the code.
So, in the spirit of never using a macro where functions will do, I suggest the following two equivalent solutions:
Solution 1
(reduce #(%2 %) routes middleware)
Solution 2
((apply comp middleware) routes)
A Better Way
The second solution is easily simplified by changing the definition of middleware from being a list of the functions to being the composition of the functions:
(def middleware
(comp wrap-json-body
wrap-cors
;; and so on
))
(middleware routes)
When I began learning Clojure, I ran across this pattern often enough that many of my early projects have an freduce defined in core:
(defn freduce
"Given an initial input and a collection of functions (f1,..,fn),
This is logically equivalent to ((comp fn ... f1) input)."
[in fs]
(reduce #(%2 %) in fs))
This is totally unnecessary, and some might prefer the direct use of reduce as being more clear. However, if you don't like staring at #(%2 %) in your application code, adding another utility word to your language is fine.
you can make a macro for that:
;; notice that it is better to use a back quote, to qoute function names for macro, as it fully qualifies them.
(def middleware
`((wrap-json-body)
(wrap-cors))
;; and so on
)
(defmacro with-middleware [routes]
`(-> ~routes ~#middleware))
for example this:
(with-middleware [1 2 3])
would expand to this:
(-> [1 2 3] (wrap-json-body) (wrap-cors))

What are side-effects in predicates and why are they bad?

I'm wondering what is considered to be a side-effect in predicates for fns like remove or filter. There seems to be a range of possibilities. Clearly, if the predicate writes to a file, this is a side-effect. But consider a situation like this:
(def *big-var-that-might-be-garbage-collected* ...)
(let [my-ref *big-var-that-might-be-garbage-collected*]
(defn my-pred
[x]
(some-operation-on my-ref x)))
Even if some-operation-on is merely a query that does not change state, the fact that my-pred retains a reference to *big... changes the state of the system in that the big var cannot be garbage collected. Is this also considered to be side-effect?
In my case, I'd like to write to a logging system in a predicate. Is this a side effect?
And why are side-effects in predicates discouraged exactly? Is it because filter and remove and their friends work lazily so that you cannot determine when the predicates are called (and - hence - when the side-effects happen)?
GC is not typically considered when evaluating if a function is pure or not, although many actions that make a function impure can have a GC effect.
Logging is a side effect, as is changing any state in the program or the world. A pure function takes data and returns data, without modifying anything else.
https://softwareengineering.stackexchange.com/questions/15269/why-are-side-effects-considered-evil-in-functional-programming covers why side effects are avoided in functional languages.
I found this link helpful
The problem is determining when, or even whether, the side-effects will occur on any given call to the function.
If you only care that the same inputs return the same answer, you are fine. Side-effects are dependent on how the function is executed.
For example,
(first (filter odd? (range 20)))
; 1
But if we arrange for odd? to print its argument as it goes:
(first (filter #(do (print %) (odd? %)) (range 20)))
It will print 012345678910111213141516171819 before returning 1!
The reason is that filter, where it can, deals with its sequence argument in chunks of 32 elements.
If we take the limit off the range:
(first (filter #(do (print %) (odd? %)) (range)))
... we get a full-size chunk printed: 012345678910111213141516171819012345678910111213141516171819202122232425262728293031
Just printing the argument is confusing. If the side effects are significant, things could go seriously awry.

How to convert lazy sequence to non-lazy in Clojure

I tried the following in Clojure, expecting to have the class of a non-lazy sequence returned:
(.getClass (doall (take 3 (repeatedly rand))))
However, this still returns clojure.lang.LazySeq. My guess is that doall does evaluate the entire sequence, but returns the original sequence as it's still useful for memoization.
So what is the idiomatic means of creating a non-lazy sequence from a lazy one?
doall is all you need. Just because the seq has type LazySeq doesn't mean it has pending evaluation. Lazy seqs cache their results, so all you need to do is walk the lazy seq once (as doall does) in order to force it all, and thus render it non-lazy. seq does not force the entire collection to be evaluated.
This is to some degree a question of taxonomy. a lazy sequence is just one type of sequence as is a list, vector or map. So the answer is of course "it depends on what type of non lazy sequence you want to get:
Take your pick from:
an ex-lazy (fully evaluated) lazy sequence (doall ... )
a list for sequential access (apply list (my-lazy-seq)) OR (into () ...)
a vector for later random access (vec (my-lazy-seq))
a map or a set if you have some special purpose.
You can have whatever type of sequence most suites your needs.
This Rich guy seems to know his clojure and is absolutely right.
Buth I think this code-snippet, using your example, might be a useful complement to this question :
=> (realized? (take 3 (repeatedly rand)))
false
=> (realized? (doall (take 3 (repeatedly rand))))
true
Indeed type has not changed but realization has
I stumbled on this this blog post about doall not being recursive. For that I found the first comment in the post did the trick. Something along the lines of:
(use 'clojure.walk)
(postwalk identity nested-lazy-thing)
I found this useful in a unit test where I wanted to force evaluation of some nested applications of map to force an error condition.
(.getClass (into '() (take 3 (repeatedly rand))))

safely parsing maps in clojure

I'm looking for an easy and safe way to parse a map, and only a map, from a string supplied by an untrusted source. The map contains keywords and numbers. What are the security concerns of using read to do this?
read is by default totally unsafe, it allows arbitrary code execution. Try (read-string "#=(println \"hello\")") as an example.
You can make it safer by binding *read-eval* to false. This will cause an exception to be triggered if there #= notation is used. For example:
(binding [*read-eval* false] (read-string "#=(println \"hello\")"))
Finally, depending on how you are using it there is a potential denial of service attack by supplying a large number of keywords (:foo, :bar). Keywords are interned and never freed so if enough are used the process will run out of memory. There's some discussion about that on the clojure-dev list.
If you want to be safe I think you basically need to parse it by hand without doing an eval. Here is an example of one way to do it:
(apply hash-map
(map #(%1 %2)
(cycle [#(keyword (apply str (drop 1 %)))
#(Integer/parseInt %)])
(string/split ":a 23 :b 32 :c 32" #" ")))
Depending on the types of numbers you want to support and how much error checking you want to do you will want to modify the two functions that are being cycled over to process every map every other value to a keyword or number.