Side effect optimized out - clojure

I am new to clojure and at some moment I faced with the problem.
I have such code in my program:
(let [ ... ]
(map (fn [[v f]] (do-side-effect v f)) {:v1 f1, :v2 f2})
(do-the-job ...))
This do-side-effect can be, for example, println of another side effect function like intern. The problem is that side effect doesn't happen.
But if i change the line to
(println (map #(fn [[v f]] (do-side-effect v f)) {:v1 f1, :v2 f2}))
Then everything is ok.
So the last idea i came to is that clojure
just optimize out the map because
it think that it's result is useless because I don't use it.
In case if this actually happens, how can I show clojure that this form
can have side effects to prevent compiler from optimizing it out?
In case if it's a bug, how can I find where the bug is?

map is lazy. It is not meant to be used directly for side effects, and it only produces values when they are consumed.
You can use dorun to force the values to be realized, even if you are not consuming them, or use doseq instead of map, doseq is intended to be used for side effects, and unlike map won't spend time constructing objects you won't ever access.

Related

What are side-effects in predicates and why are they bad?

I'm wondering what is considered to be a side-effect in predicates for fns like remove or filter. There seems to be a range of possibilities. Clearly, if the predicate writes to a file, this is a side-effect. But consider a situation like this:
(def *big-var-that-might-be-garbage-collected* ...)
(let [my-ref *big-var-that-might-be-garbage-collected*]
(defn my-pred
[x]
(some-operation-on my-ref x)))
Even if some-operation-on is merely a query that does not change state, the fact that my-pred retains a reference to *big... changes the state of the system in that the big var cannot be garbage collected. Is this also considered to be side-effect?
In my case, I'd like to write to a logging system in a predicate. Is this a side effect?
And why are side-effects in predicates discouraged exactly? Is it because filter and remove and their friends work lazily so that you cannot determine when the predicates are called (and - hence - when the side-effects happen)?
GC is not typically considered when evaluating if a function is pure or not, although many actions that make a function impure can have a GC effect.
Logging is a side effect, as is changing any state in the program or the world. A pure function takes data and returns data, without modifying anything else.
https://softwareengineering.stackexchange.com/questions/15269/why-are-side-effects-considered-evil-in-functional-programming covers why side effects are avoided in functional languages.
I found this link helpful
The problem is determining when, or even whether, the side-effects will occur on any given call to the function.
If you only care that the same inputs return the same answer, you are fine. Side-effects are dependent on how the function is executed.
For example,
(first (filter odd? (range 20)))
; 1
But if we arrange for odd? to print its argument as it goes:
(first (filter #(do (print %) (odd? %)) (range 20)))
It will print 012345678910111213141516171819 before returning 1!
The reason is that filter, where it can, deals with its sequence argument in chunks of 32 elements.
If we take the limit off the range:
(first (filter #(do (print %) (odd? %)) (range)))
... we get a full-size chunk printed: 012345678910111213141516171819012345678910111213141516171819202122232425262728293031
Just printing the argument is confusing. If the side effects are significant, things could go seriously awry.

couldn't use for loop in go block of core.async?

I'm new to clojure core.async library, and I'm trying to understand it through experiment.
But when I tried:
(let [i (async/chan)] (async/go (doall (for [r [1 2 3]] (async/>! i r)))))
it gives me a very strange exception:
CompilerException java.lang.IllegalArgumentException: No method in multimethod '-item-to-ssa' for dispatch value: :fn
and I tried another code:
(let [i (async/chan)] (async/go (doseq [r [1 2 3]] (async/>! i r))))
it have no compiler exception at all.
I'm totally confused. What happend?
So the Clojure go-block stops translation at function boundaries, for many reasons, but the biggest is simplicity. This is most commonly seen when constructing a lazy seq:
(go (lazy-seq (<! c)))
Gets compiled into something like this:
(go (clojure.lang.LazySeq. (fn [] (<! c))))
Now let's think about this real quick...what should this return? Assuming what you probably wanted was a lazy seq containing the value taken from c, but the <! needs to translate the remaining code of the function into a callback, but LazySeq is expecting the function to be synchronous. There really isn't a way around this limitation.
So back to your question if, you macroexpand for you'll see that it doesn't actually loop, instead it expands into a bunch of code that eventually calls lazy-seq and so parking ops don't work inside the body. doseq (and dotimes) however are backed by loop/recur and so those will work perfectly fine.
There are a few other places where this might trip you up with-bindings being one example. Basically if a macro sticks your core.async parking operations into a nested function, you'll get this error.
My suggestion then is to keep the body of your go blocks as simple as possible. Write pure functions, and then treat the body of go blocks as the places to do IO.
------------ EDIT -------------
By stops translation at function boundaries, I mean this: the go block takes its body and translates it into a state-machine. Each call to <! >! or alts! (and a few others) are considered state machine transitions where the execution of the block can pause. At each of those points the machine is turned into a callback and attached to the channel. When this macro reaches a fn form it stops translating. So you can only make calls to <! from inside a go block, not inside a function inside a code block.
This is part of the magic of core.async. Without the go macro, core.async code would look a lot like callback-hell in other langauges.

Clojure confusion - behavior of map, doseq in a multiprocess environment

In trying to replicate some websockets examples I've run into some behavior I don't understand and can't seem to find documentation for. Simplified, here's an example I'm running in lein that's supposed to run a function for every element in a shared map once per second:
(def clients (atom {"a" "b" "c" "d" }))
(def ticker-agent (agent nil))
(defn execute [a]
(println "execute")
(let [ keys (keys #clients) ]
(println "keys= " keys )
(doseq [ x keys ] (println x)))
;(map (fn [k] (println k)) keys)) ;; replace doseq with this?
(Thread/sleep 1000)
(send *agent* execute))
(defn -main [& args]
(send ticker-agent execute)
)
If I run this with map I get
execute
keys= (a c)
execute
keys= (a c)
...
First confusing issue: I understand that I'm likely using map incorrectly because there's no return value, but does that mean the inner println is optimized away? Especially given that if I run this in a repl:
(map #(println %) '(1 2 3))
it works fine?
Second question - if I run this with doseq instead of map I can run into conditions where the execution agent stops (which I'd append here, but am having difficulty isolating/recreating). Clearly there's something I"m missing possibly relating to locking on the maps keyset? I was able to do this even moving the shared map out of an atom. Is there default syncrhonization on the clojure map?
map is lazy. This means that it does not calculate any result until the result is accessed from the data structure it reteruns. This means that it will not run anything if its result is not used.
When you use map from the repl the print stage of the repl accesses the data, which causes any side effects in your mapped function to be invoked. Inside a function, if the return value is not investigated, any side effects in the mapping function will not occur.
You can use doall to force full evaluation of a lazy sequence. You can use dorun if you don't need the result value but want to ensure all side effects are invoked. Also you can use mapv which is not lazy (because vectors are never lazy), and gives you an associative data structure, which is often useful (better random access performance, optimized for appending rather than prepending).
Edit: Regarding the second part of your question (moving this here from a comment).
No, there is nothing about doseq that would hang your execution, try checking the agent-error status of your agent to see if there is some exception, because agents stop executing and stop accepting new tasks by default if they hit an error condition. You can also use set-error-model and set-error-handler! to customize the agent's error handling behavior.

Reducing a sequence into a shorter sequence while calling a function on each adjacent element

I've got a function that looks at two of these objects, does some mystery logic, and returns either one of them, or both (as a sequence).
I've got a sequence of these objects [o1 o2 o3 o4 ...], and I want to return a result of processing it like this:
call the mystery function on o1 and o2
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o3
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o4
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and oN
....
Here's what I've got so far:
; the % here is the input sequence
#(reduce update-algorithm [(first %)] (rest %))
(defn update-algorithm
[input-vector o2]
(apply conj (pop input-vector)
(mystery-function (peek input-vector) o2)))
What's an idiomatic way of writing this? I don't like the way that this looks. I think the apply conj is a little hard to read and so is the [(first %)] (rest %) on the first line.
into would be a better choice than apply conj.
I think [(first %)] (rest %) is just fine though. Probably the shortest way to write this and it makes it completely clear what the seed of the reduction and the sequence being reduced are.
Also, reduce is a perfect match to the task at hand, not only in the sense that it works, but also in the sense that the task is a reduction / fold. Similarly pop and peek do exactly the thing specified in the sense that it is their purpose to "keep the butlast" and "take the last" of what's been accumulated (in a vector). With the into change, the code basically tells the same story the spec does, and in fewer words to boot.
So, nope, no way to improve this, sorry. ;-)

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.
When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))
The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.