How can I help Clojure understand that 0 is the smallest natural number? - clojure

It's easy to define a lazy sequence of natural numbers in Clojure: (def N (iterate inc 0)). Unsurprisingly, if we ask Clojure to find the minimum of N using (apply min N), it gets stuck in an infinite regress.
Is there a way to "build in" the fact that (= 0 (min N)) to the data structure of N? Implicitly, we know this, since the increment function inc is strictly increasing. The min function doesn't know how to exploit this knowledge, and instead tries to brute force its way to the answer.
I don't know how to encode this programmatically. I would like a way to construct lazy sequences with additional structure like constraints & relations). I would also like a way to exploit these constraints in order to solve optimization problems (like finding the minimum or infimum of the sequence).
Is there a way to do this in native Clojure? What about with Datomic?

You can use metadata for the specific example you have.
(defn my-range
([] (my-range 0))
([n] (with-meta
(cons n (lazy-seq (my-range (inc n))))
{:onlyincreases true})))
(defn my-min [x] (if (:onlyincreases (meta x)) (first x) (min x)))
(my-min (my-range)) ;; => 0
(my-min (next (my-range))) ;; => 1
(my-min (nnext (my-range))) ;; => 2
If you need something more generalized, you might have to look into creating your own type.

Related

testing if something is an empty list

Which way should I prefer to test if an object is an empty list in Clojure? Note that I want to test just this and not if it is empty as a sequence. If it is a "lazy entity" (LazySeq, Iterate, ...) I don't want it to get realized?.
Below I give some possible tests for x.
;0
(= clojure.lang.PersistentList$EmptyList (class x))
;1
(and (list? x) (empty? x))
;2
(and (list? x) (zero? (count x)))
;3
(identical? () x)
Test 0 is a little low level and relies on "implementation details". My first version of it was (instance? clojure.lang.PersistentList$EmptyList x), which gives an IllegalAccessError. Why is that so? Shouldn't such a test be possible?
Tests 1 and 2 are higher level and more general, since list? checks if something implements IPersistentList. I guess they are slightly less efficient too. Notice that the order of the two sub-tests is important as we rely on short-circuiting.
Test 3 works under the assumption that every empty list is the same object. The tests I have done confirm this assumption but is it guaranteed to hold? Even if it is so, is it a good practice to rely on this fact?
All this may seem trivial but I was a bit puzzled not finding a completely straightforward solution (or even a built-in function) for such a simple task.
update
Perhaps I did not formulate the question very well. In retrospect, I realized that what I wanted to test was if something is a non-lazy empty sequence. The most crucial requirement for my use case is that, if it is a lazy sequence, it does not get realized, i.e. no thunk gets forced.
Using the term "list" was a little confusing. After all what is a list? If it is something concrete like PersistentList, then it is non-lazy. If it is something abstract like IPersistentList (which is what list? tests and probably the correct answer), then non-laziness is not exactly guaranteed. It just so happens that Clojure's current lazy sequence types do not implement this interface.
So first of all I need a way to test if something is a lazy sequence. The best solution I can think of right now is to use IPending to test for laziness in general:
(def lazy? (partial instance? clojure.lang.IPending))
Although there are some lazy sequence types (e.g. chunked sequences like Range and LongRange) that do not implement IPending, it seems reasonable to expect that lazy sequences implement it in general. LazySeq does so and this is what really matters in my specific use case.
Now, relying on short-circuiting to prevent realization by empty? (and to prevent giving it an unacceptable argument), we have:
(defn empty-eager-seq? [x] (and (not (lazy? x)) (seq? x) (empty? x)))
Or, if we know we are dealing with sequences like in my case, we can use the less restrictive:
(defn empty-eager? [x] (and (not (lazy? x)) (empty? x)))
Of course we can write safe tests for more general types like:
(defn empty-eager-coll? [x] (and (not (lazy? x)) (coll? x) (empty? x)))
(defn empty-eager-seqable? [x] (and (not (lazy? x)) (seqable? x) (empty? x)))
That being said, the recommended test 1 also works for my case, thanks to short-circuiting and the fact that LazySeq does not implement IPersistentList. Given this and that the question's formulation was suboptimal, I will accept Lee's succinct answer and thank Alan Thompson for his time and for the helpful mini-discussion we had with an upvote.
Option 0 should be avoided since it relies on a class within clojure.lang that is not part of the public API for the package: From the javadoc for clojure.lang:
The only class considered part of the public API is IFn. All other
classes should be considered implementation details.
Option 1 uses functions from the public API and avoids iterating the entire input sequence if it is non-empty
Option 2 iterates the entire input sequence to obtain the count which is potentially expensive.
Option 3 does not appear to be guaranteed and can be circumvented with reflection:
(identical? '() (.newInstance (first (.getDeclaredConstructors (class '()))) (into-array [{}])))
=> false
Given these I'd prefer option 1.
Just use choice (1):
(ns tst.demo.core
(:use tupelo.core tupelo.test) )
(defn empty-list? [arg] (and (list? arg)
(not (seq arg))))
(dotest
(isnt (empty-list? (range)))
(isnt (empty-list? [1 2 3]))
(isnt (empty-list? (list 1 2 3)))
(is (empty-list? (list)))
(isnt (empty-list? []))
(isnt (empty-list? {}))
(isnt (empty-list? #{})))
with result:
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
Ran 2 tests containing 7 assertions.
0 failures, 0 errors.
As you can see by the first test with (range), the infinite lazy seq didn't get realized by empty?.
Update
Choice 0 depends on implementation details (unlikely to change, but why bother?). Also, it is noisier to read.
Choice 2 will blow up for infinite lazy seq's.
Choice 3 is not guaranteed to work. You could have more than one list with zero elements.
Update #2
OK, you are correct re (2). We get:
(type (range)) => clojure.lang.Iterate
Notice that it is not a Lazy-Seq as both you and I expected.
So you are relying on a (non-obvious) detail to prevent getting to count, which will blow up for an infinite lazy seq. Too subtle for my taste. My motto: Keep it as obvious as possible
Re choice (3), again it relies on the implementation detail of (the current release of) Clojure. I could almost make it fail except that clojure.lang.PersistentList$EmptyList is a package-protected inner class, so I would have to really try hard (subvert Java inheritance) to make a duplicate instance of the class, which would then fail.
However, I can come close:
(defn el3? [arg] (identical? () arg))
(dotest
(spyx (type (range)))
(isnt (el3? (range)))
(isnt (el3? [1 3 3]))
(isnt (el3? (list 1 3 3)))
(is (el3? (list)))
(isnt (el3? []))
(isnt (el3? {}))
(isnt (el3? #{}))
(is (el3? ()))
(is (el3? '()))
(is (el3? (list)))
(is (el3? (spyxx (rest [1]))))
(let [jull (LinkedList.)]
(spyx jull)
(spyx (type jull))
(spyx (el3? jull))) ; ***** contrived, but it fails *****
with result
jull => ()
(type jull) => java.util.LinkedList
(el3? jull) => false
So, I again make a plea to keep it obvious and simple.
There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the
other way is to make it so complicated that there are no obvious
deficiencies.
---C.A.R. Hoare

What scope should calls to lazy-seq have?

I'm writing a lazy implementation of the Recamán's Sequence, and ran into some confusion regarding where calls to lazy-seq should happen.
This first version I came up with this morning was:
(defn lazy-recamans-sequence []
(let [f (fn rec [n seen last-s]
(let [back (- last-s n)
new-s (if (and (pos? back) (not (seen back)))
back
(+ last-s n))]
(lazy-seq ; Here
(cons new-s (rec (inc n) (conj seen new-s) new-s)))))]
(f 0 #{} 0)))
Then I realized that my placement of lazy-seq was kind of arbitrary, and that it could be placed higher to wrap more of the computations:
(defn lazy-recamans-sequence2 []
(let [f (fn rec [n seen last-s]
(lazy-seq ; Here
(let [back (- last-s n)
new-s (if (and (pos? back) (not (seen back)))
back
(+ last-s n))]
(cons new-s (rec (inc n) (conj seen new-s) new-s)))))]
(f 0 #{} 0)))
Then I looked back on a review that someone gave me last night:
(defn recaman []
(letfn [(tail [previous n seen]
(let [nx (if (and (> previous n) (not (seen (- previous n))))
(- previous n)
(+ previous n))]
; Here, inside "cons"
(cons nx (lazy-seq (tail nx (inc n) (conj seen nx))))))]
(tail 0 0 #{})))
And they have theirs inside of the call to cons!
Thinking this over, it seems like it wouldn't make a difference. With a broader scope (like the second version), more code is inside the explicit function that's passed to LazySeq. With a narrower scope however, the function itself may be smaller, but since the passed function involves a recursive call, it will be executing the same code anyways.
They seem to preform nearly identically and give the same answers. Is there any reason to prefer placing lazy-seq in one place over another? Is this simply a stylistic choice, or can this have actual repercussions?
In the first two examples the lazy-seq wraps the cons call. This means that when you generate call the function you return a lazy sequence immediately without calculating the first item of the sequence.
In the first example the let expression is still outside of lazy-seq so the value of the first item is calculated immediately but the returned sequence is still lazy and not realized.
The second example is similar to the first. The lazy-seq wraps the cons cell and also the let block. This means that the function will return immediatetly and the value of the first item is calculated only when the caller starts to consume the lazy sequence.
In the third example the value of the first item in the list is calculated immediately and only the tail of the returned sequence is lazy.
Is there any reason to prefer placing lazy-seq in one place over another?
It depends on what you want to achieve. Do you want to return a sequence immediately without calculating any values? In this case make the scope of lazy-seq as broad as possible. Otherwise try to restrict the scope of lazy-seq to calculate only the tail part of the sequence.
When I was first learning Clojure, I was a bit confused by the many possible choices of lazy-seq constructs, the lack of clarity in terms of which construct to choose, and the somewhat vague explanation for how lazy-seq creates laziness in the first place (it is implemented as a Java class of ~240 lines).
To reduce repetition and keep things as simple as possible, I created the lazy-cons macro. It is used like so:
(defn lazy-countdown [n]
(when (<= 0 n)
(lazy-cons n (lazy-countdown (dec n)))))
(deftest t-all
(is= (lazy-countdown 5) [5 4 3 2 1 0] )
(is= (lazy-countdown 1) [1 0] )
(is= (lazy-countdown 0) [0] )
(is= (lazy-countdown -1) nil ))
This version does realize the initial value n immediately.
I never worry about chunking (typically batches of 32) or trying to precisely control the number of elements realized in a lazy sequence. IMHO, if you need fine-grained control such as this, it is better to use an explicit loop than to make assumptions on the timing of realizations in a lazy sequence.

What is the difference between the Clojure function (nth [coll index]) and the composition (last (take index coll))

I'm trying to work through Stuart Halloway's book Programming Clojure. This whole functional stuff is very new to me.
I understand how
(defn fibo[]
(map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))
generates the Fibonacci sequence lazily. I do not understand why
(last (take 1000000 (fibo)))
works, while
(nth (fibo) 1000000)
throws an OutOfMemoryError. Could someone please explain how these two expressions differ? Is (nth) somehow holding on to the head of the sequence?
Thanks!
I think you are talking about issue that was discussed in google group and Rich Hickey provided patch that solved the problem. And the book, whick was published later, didn't cover this topic.
In clojure 1.3 your nth example works with minor improvements in fibo function. Now, due to changes in 1.3, we should explicitly flag M to use arbitrary precision, or it falls with throwIntOverflow.
(defn fibo[]
(map first (iterate (fn [[a b]] [b (+ a b)]) [0M 1M])))
And with these changes
(nth (fibo) 1000000)
succeed (if you have enough memory)
What Clojure version are you using? Try (clojure-version) on a repl. I get identical results for both expressions in 1.3.0, namely an integer overflow.
For
(defn fibo[]
(map first (iterate (fn [[a b]] [b (+ a b)]) [(bigint 0) 1])))
I get correct results for both expressions (a really big integer...).
I think that you may be hitting a specific memory limit for your machine, and not a real difference in function.
Looking at the source code for nth in https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java it does not look like either nth or take are retaining the head.
However, nth uses zero-based indexing, rather than a count by item number. Your code with nth selects the 1000001st element of the sequence (the one at index 1000000). You code with take is returning the final element in a 1000000 element sequence. That's the item with the index 999999. Given how fast fib grows, that last item could be the one that broke the camel's back.
Also, I was checking the 1.3.0 source. Perhaps earlier versions had different implementations. To get your fibo to work properly in 1.3.0 you need to use the arithmetic functions that will promote numbers to bignums:
(defn fibo[]
(map first (iterate (fn [[a b]] [b (+' a b)]) [0 1])))

Should not a tail-recursive function also be faster?

I have the following Clojure code to calculate a number with a certain "factorable" property. (what exactly the code does is secondary).
(defn factor-9
([]
(let [digits (take 9 (iterate #(inc %) 1))
nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))]
(some (fn [x] (and (factor-9 x) x)) nums)))
([n]
(or
(= 1 (count (str n)))
(and (divisible-by-length n) (factor-9 (quot n 10))))))
Now, I'm into TCO and realize that Clojure can only provide tail-recursion if explicitly told so using the recur keyword. So I've rewritten the code to do that (replacing factor-9 with recur being the only difference):
(defn factor-9
([]
(let [digits (take 9 (iterate #(inc %) 1))
nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))]
(some (fn [x] (and (factor-9 x) x)) nums)))
([n]
(or
(= 1 (count (str n)))
(and (divisible-by-length n) (recur (quot n 10))))))
To my knowledge, TCO has a double benefit. The first one is that it does not use the stack as heavily as a non tail-recursive call and thus does not blow it on larger recursions. The second, I think is that consequently it's faster since it can be converted to a loop.
Now, I've made a very rough benchmark and have not seen any difference between the two implementations although. Am I wrong in my second assumption or does this have something to do with running on the JVM (which does not have automatic TCO) and recur using a trick to achieve it?
Thank you.
The use of recur does speed things up, but only by about 3 nanoseconds (really) over a recursive call. When things get that small they can be hidden in the noise of the rest of the test. I wrote four tests (link below) that are able to illustrate the difference in performance.
I'd also suggest using something like criterium when benchmarking. (Stack Overflow won't let me post with more than 1 link since I've got no reputation to speak of, so you'll have to google it, maybe "clojure criterium")
For formatting reasons, I've put the tests and results in this gist.
Briefly, to compare relatively, if the recursive test is 1, then the looping test is about 0.45, and the TCO tests about 0.87 and the absolute difference between the recursive and TCO tests are around 3ns.
Of course, all the caveats about benchmarking apply.
When optimizing any code, it's good to start from potential or actual bottlenecks and optimize that first.
It seems to me that this particular piece of code is eating most of your CPU time:
(map (fn [x] ,(Integer. (apply str x))) (permutations digits))
And that doesn't depend on TCO in any way - it is executed in same way. So, tail call in this particular example will allow you not to use up all the stack, but to achieve better performance, try optimizing this.
just a gentile reminder that clojure has no TCO
After evaluating factor-9 (quot n 10) an and and an or has to be evaluated before the function can return. Thus it is not tail-recursive.

Problem with Clojure function

everyone, I've started working yesterday on the Euler Project in Clojure and I have a problem with one of my solutions I cannot figure out.
I have this function:
(defn find-max-palindrom-in-range [beg end]
(reduce max
(loop [n beg result []]
(if (>= n end)
result
(recur (inc n)
(concat result
(filter #(is-palindrom? %)
(map #(* n %) (range beg end)))))))))
I try to run it like this:
(find-max-palindrom-in-range 100 1000)
and I get this exception:
java.lang.Integer cannot be cast to clojure.lang.IFn
[Thrown class java.lang.ClassCastException]
which I presume means that at some place I'm trying to evaluate an Integer as a function. I however cannot find this place and what puzzles me more is that everything works if I simply evaluate it like this:
(reduce max
(loop [n 100 result []]
(if (>= n 1000)
result
(recur (inc n)
(concat result
(filter #(is-palindrom? %)
(map #(* n %) (range 100 1000))))))))
(I've just stripped down the function definition and replaced the parameters with constants)
Thanks in advance for your help and sorry that I probably bother you with idiotic mistake on my part. Btw I'm using Clojure 1.1 and the newest SLIME from ELPA.
Edit: Here is the code to is-palindrom?. I've implemented it as a text property of the number, not a numeric one.
(defn is-palindrom? [n]
(loop [num (String/valueOf n)]
(cond (not (= (first num) (last num))) false
(<= (.length num) 1) true
:else (recur (.substring num 1 (dec (.length num)))))))
The code works at my REPL (1.1). I'd suggest that you paste it back at yours and try again -- perhaps you simply mistyped something?
Having said that, you could use this as an opportunity to make the code simpler and more obviously correct. Some low-hanging fruit (don't read if you think it could take away from your Project Euler fun, though with your logic already written down I think it shouldn't):
You don't need to wrap is-palindrome? in an anonymous function to pass it to filter. Just write (filter is-palindrome? ...) instead.
That loop in is-palindrome? is pretty complex. Moreover, it's not particularly efficient (first and last both make a seq out of the string first, then last needs to traverse all of it). It would be simpler and faster to (require '[clojure.contrib.str-utils2 :as str]) and use (= num (str/reverse num)).
Since I mentioned efficiency, using concat in this manner is a tad dangerous -- it creates a lazy seq, which might blow up if you pile up two many levels of laziness (this will not matter in the context of Euler 4, but it's good to keep it in mind). If you really need to extend vectors to the right, prefer into.
To further simplify things, you could consider breaking them apart into a function to filter a given sequence so that only palindromes remain and a separate function to return all products of two three-digit numbers. The latter can be accomplished with e.g.
(for [f (range 100 1000)
s (range 100 1000)
:when (<= f s)] ; avoid duplication of effort
(* f s))