What are side-effects in predicates and why are they bad? - clojure

I'm wondering what is considered to be a side-effect in predicates for fns like remove or filter. There seems to be a range of possibilities. Clearly, if the predicate writes to a file, this is a side-effect. But consider a situation like this:
(def *big-var-that-might-be-garbage-collected* ...)
(let [my-ref *big-var-that-might-be-garbage-collected*]
(defn my-pred
[x]
(some-operation-on my-ref x)))
Even if some-operation-on is merely a query that does not change state, the fact that my-pred retains a reference to *big... changes the state of the system in that the big var cannot be garbage collected. Is this also considered to be side-effect?
In my case, I'd like to write to a logging system in a predicate. Is this a side effect?
And why are side-effects in predicates discouraged exactly? Is it because filter and remove and their friends work lazily so that you cannot determine when the predicates are called (and - hence - when the side-effects happen)?

GC is not typically considered when evaluating if a function is pure or not, although many actions that make a function impure can have a GC effect.
Logging is a side effect, as is changing any state in the program or the world. A pure function takes data and returns data, without modifying anything else.
https://softwareengineering.stackexchange.com/questions/15269/why-are-side-effects-considered-evil-in-functional-programming covers why side effects are avoided in functional languages.

I found this link helpful
The problem is determining when, or even whether, the side-effects will occur on any given call to the function.
If you only care that the same inputs return the same answer, you are fine. Side-effects are dependent on how the function is executed.

For example,
(first (filter odd? (range 20)))
; 1
But if we arrange for odd? to print its argument as it goes:
(first (filter #(do (print %) (odd? %)) (range 20)))
It will print 012345678910111213141516171819 before returning 1!
The reason is that filter, where it can, deals with its sequence argument in chunks of 32 elements.
If we take the limit off the range:
(first (filter #(do (print %) (odd? %)) (range)))
... we get a full-size chunk printed: 012345678910111213141516171819012345678910111213141516171819202122232425262728293031
Just printing the argument is confusing. If the side effects are significant, things could go seriously awry.

Related

how to spec a lazy-seq generating function?

I wish to use spec in my pre and post conditions of a generator function. A simplified example of what I wish to do is described below:
(defn positive-numbers
([]
{:post [(s/valid? (s/+ int?) %)]}
(positive-numbers 1))
([n]
{:post [(s/valid? (s/+ int?) %)]}
(lazy-seq (cons n (positive-numbers (inc n))))))
(->> (positive-numbers) (take 5))
However, defining the generator function like that seems to cause stack-overflow, the cause being that spec will eagerly try to evaluate the whole thing, -or something like that....
Is there another way of using spec to describe the :post result of a generator function like the one above (without causing stack-overflow)?
The theoretically correct answer is that in general you cannot check whether a lazy sequence matches a spec without realizing all of it.
In the case of your specific example of (s/+ int?), given a lazy sequence, how would one establish merely by observing the sequence whether all its elements are integers? However many elements you examine, the next one could always be a keyword.
This is the sort of thing that a type system like, say, core.typed may be able to prove, but a runtime-predicate-based assertion won't be able to check.
Now, in addition to s/+ and s/*, spec (as of Clojure 1.9.0-alpha14) also has a a combinator called s/every, whose docstring says this:
Note that 'every' does not do exhaustive checking, rather it samples *coll-check-limit* elements.
So we have e.g.
(s/valid? (s/* int?) (concat (range 1000) [:foo]))
;= false
but
(s/valid? (s/every int?) (concat (range 1000) [:foo]))
;= true
(with the default *coll-check-limit* value of 101).
This actually isn't an immediate fix to your example – plugging in s/every in place of s/+ won't work, because each recursive call will want to validate its own return value, which will involve realizing more of the sequence, which will involve more recursive calls etc. But you could factor out the sequence-building logic to a helper function with no postconditions and then have positive-numbers declare the postcondition and call that helper function:
(defn positive-numbers* [n]
(lazy-seq (cons n (positive-numbers* (inc n)))))
(defn positive-numbers [n]
{:post [(s/valid? (s/every int? :min-count 1) %)]}
(positive-numbers* n))
Note the caveats:
this will still realize a good chunk of your sequence, which may wreak havoc with your application's performance profile;
the only watertight guarantee here is that the prefix actually examined is as desired, if the seq has a weird item at position 123456, that will go unnoticed.
Because of (1), this is something that makes more sense as a test-only assertion. (2) may be acceptable – you'll still catch some silly typos and the documentation value of the spec is there anyway; if it isn't and you do want an absolutely watertight guarantee that your return type is as desired, then again, core.typed (perhaps used locally just for a handful of namespaces) may be the better bet.

Side effect optimized out

I am new to clojure and at some moment I faced with the problem.
I have such code in my program:
(let [ ... ]
(map (fn [[v f]] (do-side-effect v f)) {:v1 f1, :v2 f2})
(do-the-job ...))
This do-side-effect can be, for example, println of another side effect function like intern. The problem is that side effect doesn't happen.
But if i change the line to
(println (map #(fn [[v f]] (do-side-effect v f)) {:v1 f1, :v2 f2}))
Then everything is ok.
So the last idea i came to is that clojure
just optimize out the map because
it think that it's result is useless because I don't use it.
In case if this actually happens, how can I show clojure that this form
can have side effects to prevent compiler from optimizing it out?
In case if it's a bug, how can I find where the bug is?
map is lazy. It is not meant to be used directly for side effects, and it only produces values when they are consumed.
You can use dorun to force the values to be realized, even if you are not consuming them, or use doseq instead of map, doseq is intended to be used for side effects, and unlike map won't spend time constructing objects you won't ever access.

Idiomatic no-op/"pass"

What's the (most) idiomatic Clojure representation of no-op? I.e.,
(def r (ref {}))
...
(let [der #r]
(match [(:a der) (:b der)]
[nil nil] (do (fill-in-a) (fill-in-b))
[_ nil] (fill-in-b)
[nil _] (fill-in-a)
[_ _] ????))
Python has pass. What should I be using in Clojure?
ETA: I ask mostly because I've run into places (cond, e.g.) where not supplying anything causes an error. I realize that "most" of the time, an equivalent of pass isn't needed, but when it is, I'd like to know what's the most Clojuric.
I see the keyword :default used in cases like this fairly commonly.
It has the nice property of being recognizable in the output and or logs. This way when you see a log line like: "process completed :default" it's obvious that nothing actually ran. This takes advantage of the fact that keywords are truthy in Clojure so the default will be counted as a success.
There are no "statements" in Clojure, but there are an infinite number of ways to "do nothing". An empty do block (do), literally indicates that one is "doing nothing" and evaluates to nil. Also, I agree with the comment that the question itself indicates that you are not using Clojure in an idiomatic way, regardless of this specific stylistic question.
The most analogous thing that I can think of in Clojure to a "statement that does nothing" from imperative programming would be a function that does nothing. There are a couple of built-ins that can help you here: identity is a single-arg function that simply returns its argument, and constantly is a higher-order function that accepts a value, and returns a function that will accept any number of arguments and return that value. Both are useful as placeholders in situations where you need to pass a function but don't want that function to actually do much of anything. A simple example:
(defn twizzle [x]
(let [f (cond (even? x) (partial * 4)
(= 0 (rem x 3)) (partial + 2)
:else identity)]
(f (inc x))))
Rewriting this function to "do nothing" in the default case, while possible, would require an awkward rewrite without the use of identity.

Why does Clojure's gensym increase by three on each call?

Fairly new to lisps, but in looking into sequential integer generating code, I noticed that repeated calls to (gensym) would increase the number provided after the prefix by 3. I'm curious why that is the case.
user=> (gensym)
G__662
user=> (gensym)
G__665
user=> (gensym)
G__668
user=> (gensym)
G__671
user=> (gensym)
G__674
user=> (gensym)
G__677
I've seen and understand the combined use of atom and inc, but I'm new to the gensym function.
There are a number of correct answers here. One is: it doesn't!
user> (take 5 (repeatedly gensym))
(G__2173 G__2174 G__2175 G__2176 G__2177)
Another is: gensym doesn't make any guarantees as to the form of the symbols it generates, so you really shouldn't care whether they're sequential or not (or even if they contain numbers at all). You certainly shouldn't hijack gensym to produce an integer sequence.
Lastly: why does it increase by three in your example? Because each time you evaluate a form in the repl, the compiler has to create some gensyms of its own. Apparently, for the form (gensym), the number it needs to create is two.
It doesn't!
=> (str (gensym) (gensym))
"G__4027G__4028"
Looking at the source of gensym we can see that it uses clojure.lang.RT/nextID.
(defn gensym
([prefix-string] (. clojure.lang.Symbol (intern (str prefix-string (str (. clojure.lang.RT (nextID))))))))
The nextID function is also used in the LispReader. So when you repeatedly evaluate (gensym), the reader is probably using two IDs.
I clearly have something else going on in my process too, as if I wait any time between evaluations, more IDs are consumed and the gensym gaps further than just 3.
https://github.com/clojure/clojure/search?q=nextid

Reducing a sequence into a shorter sequence while calling a function on each adjacent element

I've got a function that looks at two of these objects, does some mystery logic, and returns either one of them, or both (as a sequence).
I've got a sequence of these objects [o1 o2 o3 o4 ...], and I want to return a result of processing it like this:
call the mystery function on o1 and o2
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o3
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o4
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and oN
....
Here's what I've got so far:
; the % here is the input sequence
#(reduce update-algorithm [(first %)] (rest %))
(defn update-algorithm
[input-vector o2]
(apply conj (pop input-vector)
(mystery-function (peek input-vector) o2)))
What's an idiomatic way of writing this? I don't like the way that this looks. I think the apply conj is a little hard to read and so is the [(first %)] (rest %) on the first line.
into would be a better choice than apply conj.
I think [(first %)] (rest %) is just fine though. Probably the shortest way to write this and it makes it completely clear what the seed of the reduction and the sequence being reduced are.
Also, reduce is a perfect match to the task at hand, not only in the sense that it works, but also in the sense that the task is a reduction / fold. Similarly pop and peek do exactly the thing specified in the sense that it is their purpose to "keep the butlast" and "take the last" of what's been accumulated (in a vector). With the into change, the code basically tells the same story the spec does, and in fewer words to boot.
So, nope, no way to improve this, sorry. ;-)