Reducing a sequence into a shorter sequence while calling a function on each adjacent element - clojure

I've got a function that looks at two of these objects, does some mystery logic, and returns either one of them, or both (as a sequence).
I've got a sequence of these objects [o1 o2 o3 o4 ...], and I want to return a result of processing it like this:
call the mystery function on o1 and o2
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o3
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o4
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and oN
....
Here's what I've got so far:
; the % here is the input sequence
#(reduce update-algorithm [(first %)] (rest %))
(defn update-algorithm
[input-vector o2]
(apply conj (pop input-vector)
(mystery-function (peek input-vector) o2)))
What's an idiomatic way of writing this? I don't like the way that this looks. I think the apply conj is a little hard to read and so is the [(first %)] (rest %) on the first line.

into would be a better choice than apply conj.
I think [(first %)] (rest %) is just fine though. Probably the shortest way to write this and it makes it completely clear what the seed of the reduction and the sequence being reduced are.
Also, reduce is a perfect match to the task at hand, not only in the sense that it works, but also in the sense that the task is a reduction / fold. Similarly pop and peek do exactly the thing specified in the sense that it is their purpose to "keep the butlast" and "take the last" of what's been accumulated (in a vector). With the into change, the code basically tells the same story the spec does, and in fewer words to boot.
So, nope, no way to improve this, sorry. ;-)

Related

Clojure - how macroexpansion works inside of the "some" function

Just when I thought I had a pretty good handle on macros, I came across the source for some which looked a bit odd to me at first glance.
(defn some
[pred coll]
(when (seq coll)
(or (pred (first coll)) (recur pred (next coll)))))
My first instinct was that seems like it would be stack consuming, but then I remembered: "No, dummy, or is a macro so it would simply expand into a ton of nested ifs".
However mulling it over a bit more I ended up thinking myself in a corner. At expansion time the function source would look like this:
(defn some
[pred coll]
(when (seq coll)
(let [or__4469__auto__ (pred (first coll))]
(if or__4469__auto__
or__4469__auto__
(recur pred (next coll))))))
Now what's got me confused is that final recur call. I've always thought that macroexpansion occurs prior to runtime, yet here you have to actually call the already expanded code at runtime in order for the second macroexp .... wait a second, I think i just figured it out.
There is no second macroexpansion, there are no nested if blocks, only the one if block. The call to recur just keeps rebinding pred and coll but the same single block above keeps testing for truth until it finds it, or the collection runs out and nil is returned.
Can someone confirm if this is a correct interpretation? I had initially confused myself thinking that there would be an interleaving of macroexpansion and runtime wherein at runtime the call to recur would somehow result in a new macro call, which didn't make sense since macroexpansion must occur prior to runtime. Now I think I see where my confusion was, there is only ever one macro expansion and the resulting code is used over and over in a loop.
To start with, note that any function can serve as an implicit loop expression. Also, recur works just like a recursive function call, except it does not use up the stack because of a compiler trick (that is why loop & recur are "special forms" - they don't follow the rules of normal functions).
Also, remember that when is a macro that expands into an if expression.
Having said all that, you did reach the correct conclusion.
There are two modes of recursion going on here:
The or macro is implicitly recursive, provoked by the sequence of argument
forms into generating a tree of if forms.
The some function is explicitly recursive, provoked into telling the single
sequence of its final argument. The fact that this recursion is
recurable is irrelevant.
Every argument to the or macro beyond the first generates a nested if form. For example, ...
=> (clojure.walk/macroexpand-all '(or a b c))
(let* [or__5501__auto__ a]
(if or__5501__auto__ or__5501__auto__
(let* [or__5501__auto__ b]
(if or__5501__auto__ or__5501__auto__ c))))
You have two arguments to or, so one if form. As Alan Thompson's excellent answer points out, the surrounding when unwraps into another if form.
You can have as many nested if forms as you like, the leaves of the if tree, all of them, are in tail position. Hence all immediate recursive calls there are recurable. If there was no such tail recursion, the recur call would fail to compile.

What are side-effects in predicates and why are they bad?

I'm wondering what is considered to be a side-effect in predicates for fns like remove or filter. There seems to be a range of possibilities. Clearly, if the predicate writes to a file, this is a side-effect. But consider a situation like this:
(def *big-var-that-might-be-garbage-collected* ...)
(let [my-ref *big-var-that-might-be-garbage-collected*]
(defn my-pred
[x]
(some-operation-on my-ref x)))
Even if some-operation-on is merely a query that does not change state, the fact that my-pred retains a reference to *big... changes the state of the system in that the big var cannot be garbage collected. Is this also considered to be side-effect?
In my case, I'd like to write to a logging system in a predicate. Is this a side effect?
And why are side-effects in predicates discouraged exactly? Is it because filter and remove and their friends work lazily so that you cannot determine when the predicates are called (and - hence - when the side-effects happen)?
GC is not typically considered when evaluating if a function is pure or not, although many actions that make a function impure can have a GC effect.
Logging is a side effect, as is changing any state in the program or the world. A pure function takes data and returns data, without modifying anything else.
https://softwareengineering.stackexchange.com/questions/15269/why-are-side-effects-considered-evil-in-functional-programming covers why side effects are avoided in functional languages.
I found this link helpful
The problem is determining when, or even whether, the side-effects will occur on any given call to the function.
If you only care that the same inputs return the same answer, you are fine. Side-effects are dependent on how the function is executed.
For example,
(first (filter odd? (range 20)))
; 1
But if we arrange for odd? to print its argument as it goes:
(first (filter #(do (print %) (odd? %)) (range 20)))
It will print 012345678910111213141516171819 before returning 1!
The reason is that filter, where it can, deals with its sequence argument in chunks of 32 elements.
If we take the limit off the range:
(first (filter #(do (print %) (odd? %)) (range)))
... we get a full-size chunk printed: 012345678910111213141516171819012345678910111213141516171819202122232425262728293031
Just printing the argument is confusing. If the side effects are significant, things could go seriously awry.

Clojure lazy-seq over Java iterative code

I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.
So I created this next-record function in Clojure that will get the next record from a parser.
(defn next-record
"Get the next record from the parser and process it."
[parser processor]
(let [datamap (.readData parser)
row (.getRecord parser datamap)]
(if (nil? row)
nil
(.processRecord processor row 100))))
The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (datamap-seq parser processor)))))
I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.
Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(loop [records (seq '())]
(if-let [record (next-record parser processor)]
(recur (cons record records))
records))))
Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.
Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?
UPDATE:
I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (remove empty? (datamap-seq parser processor))))))
I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).
loop/recur loops within the loop expression until the recursion runs out. adding a lazy-seq around it won't prevent that.
Your first attempt with lazy-seq / cons should already work as you want, without stack overflows. I can't spot right now what the problem with it is, though it might be in the java part of the code.
I'll post here addition to Joost's answer. This code:
(defn integers [start]
(lazy-seq
(cons
start
(integers (inc start)))))
will not throw StackOverflowExceptoin if I do something like this:
(take 5 (drop 1000000 (integers)))
EDIT:
Of course better way to do it would be to (iterate inc 0). :)
EDIT2:
I'll try to explain a little how lazy-seq works. lazy-seq is a macro that returns seq-like object. Combined with cons that doesn't realize its second argument until it is requested you get laziness.
Now take a look at how LazySeq class is implemented. LazySeq.sval triggers computation of the next value which returns another instance of "frozen" lazy sequence. Method LazySeq.seq even better shows mechanics behind the concept. Notice that to fully realize sequence it uses while loop. It in itself means that stack trace use is limited to short function calls that return another instances of LazySeq.
I hope this makes any sense. I described what I could deduce from the source code. Please let me know if I made any mistakes.

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.
When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))
The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.

Overflow while using recur in clojure

I have a simple prime number calculator in clojure (an inefficient algorithm, but I'm just trying to understand the behavior of recur for now). The code is:
(defn divisible [x,y] (= 0 (mod x y)))
(defn naive-primes [primes candidates]
(if (seq candidates)
(recur (conj primes (first candidates))
(remove (fn [x] (divisible x (first candidates))) candidates))
primes)
)
This works as long as I am not trying to find too many numbers. For example
(print (sort (naive-primes [] (range 2 2000))))
works. For anything requiring more recursion, I get an overflow error.
(print (sort (naive-primes [] (range 2 20000))))
will not work. In general, whether I use recur or call naive-primes again without the attempt at TCO doesn't appear to make any difference. Why am I getting errors for large recursions while using recur?
recur always uses tail recursion, regardless of whether you are recurring to a loop or a function head. The issue is the calls to remove. remove calls first to get the element from the underlying seq and checks to see if that element is valid. If the underlying seq was created by a call to remove, you get another call to first. If you call remove 20000 times on the same seq, calling first requires calling first 20000 times, and none of the calls can be tail recursive. Hence, the stack overflow error.
Changing (remove ...) to (doall (remove ...)) fixes the problem, since it prevents the infinite stacking of remove calls (each one gets fully applied immediately and returns a concrete seq, not a lazy seq). I think this method only ever keeps one candidates list in memory at one time, though I am not positive about this. If so, it isn't too space inefficient, and a bit of testing shows that it isn't actually much slower.