I tried the following in Clojure, expecting to have the class of a non-lazy sequence returned:
(.getClass (doall (take 3 (repeatedly rand))))
However, this still returns clojure.lang.LazySeq. My guess is that doall does evaluate the entire sequence, but returns the original sequence as it's still useful for memoization.
So what is the idiomatic means of creating a non-lazy sequence from a lazy one?
doall is all you need. Just because the seq has type LazySeq doesn't mean it has pending evaluation. Lazy seqs cache their results, so all you need to do is walk the lazy seq once (as doall does) in order to force it all, and thus render it non-lazy. seq does not force the entire collection to be evaluated.
This is to some degree a question of taxonomy. a lazy sequence is just one type of sequence as is a list, vector or map. So the answer is of course "it depends on what type of non lazy sequence you want to get:
Take your pick from:
an ex-lazy (fully evaluated) lazy sequence (doall ... )
a list for sequential access (apply list (my-lazy-seq)) OR (into () ...)
a vector for later random access (vec (my-lazy-seq))
a map or a set if you have some special purpose.
You can have whatever type of sequence most suites your needs.
This Rich guy seems to know his clojure and is absolutely right.
Buth I think this code-snippet, using your example, might be a useful complement to this question :
=> (realized? (take 3 (repeatedly rand)))
false
=> (realized? (doall (take 3 (repeatedly rand))))
true
Indeed type has not changed but realization has
I stumbled on this this blog post about doall not being recursive. For that I found the first comment in the post did the trick. Something along the lines of:
(use 'clojure.walk)
(postwalk identity nested-lazy-thing)
I found this useful in a unit test where I wanted to force evaluation of some nested applications of map to force an error condition.
(.getClass (into '() (take 3 (repeatedly rand))))
Related
I wish to use spec in my pre and post conditions of a generator function. A simplified example of what I wish to do is described below:
(defn positive-numbers
([]
{:post [(s/valid? (s/+ int?) %)]}
(positive-numbers 1))
([n]
{:post [(s/valid? (s/+ int?) %)]}
(lazy-seq (cons n (positive-numbers (inc n))))))
(->> (positive-numbers) (take 5))
However, defining the generator function like that seems to cause stack-overflow, the cause being that spec will eagerly try to evaluate the whole thing, -or something like that....
Is there another way of using spec to describe the :post result of a generator function like the one above (without causing stack-overflow)?
The theoretically correct answer is that in general you cannot check whether a lazy sequence matches a spec without realizing all of it.
In the case of your specific example of (s/+ int?), given a lazy sequence, how would one establish merely by observing the sequence whether all its elements are integers? However many elements you examine, the next one could always be a keyword.
This is the sort of thing that a type system like, say, core.typed may be able to prove, but a runtime-predicate-based assertion won't be able to check.
Now, in addition to s/+ and s/*, spec (as of Clojure 1.9.0-alpha14) also has a a combinator called s/every, whose docstring says this:
Note that 'every' does not do exhaustive checking, rather it samples *coll-check-limit* elements.
So we have e.g.
(s/valid? (s/* int?) (concat (range 1000) [:foo]))
;= false
but
(s/valid? (s/every int?) (concat (range 1000) [:foo]))
;= true
(with the default *coll-check-limit* value of 101).
This actually isn't an immediate fix to your example – plugging in s/every in place of s/+ won't work, because each recursive call will want to validate its own return value, which will involve realizing more of the sequence, which will involve more recursive calls etc. But you could factor out the sequence-building logic to a helper function with no postconditions and then have positive-numbers declare the postcondition and call that helper function:
(defn positive-numbers* [n]
(lazy-seq (cons n (positive-numbers* (inc n)))))
(defn positive-numbers [n]
{:post [(s/valid? (s/every int? :min-count 1) %)]}
(positive-numbers* n))
Note the caveats:
this will still realize a good chunk of your sequence, which may wreak havoc with your application's performance profile;
the only watertight guarantee here is that the prefix actually examined is as desired, if the seq has a weird item at position 123456, that will go unnoticed.
Because of (1), this is something that makes more sense as a test-only assertion. (2) may be acceptable – you'll still catch some silly typos and the documentation value of the spec is there anyway; if it isn't and you do want an absolutely watertight guarantee that your return type is as desired, then again, core.typed (perhaps used locally just for a handful of namespaces) may be the better bet.
Here's a use of the standard 'contains?' function in Clojure-
(contains? {:state "active", :course_n "law", :course_i "C0"} :state)
and it returns the expected
true
I used the following
Clojure: Idiomatic way to call contains? on a lazy sequence
as a guide for building a lazy-contains? as this is what I need for my present use-case.
The problem I'm facing is that for a map these alternatives are not returning the same answer, giving either a false or a nil response. I've tried looking at the source for contains? and it's slow going trying to understand what's happening so I can correct the lazy-contains? appropriately (for the record Clojure is essentially my first programming language, and my exposure to Java is very limited).
Any thoughts or ideas on how I might approach this? I tried every variant on the linked question I could.
Thanks in advance.
Edited to remove the error pointed out by #amalloy.
I think your problem is with the way that maps present themselves as sequences.
Given
(def data {:state "active", :course_n "law", :course_i "C0"})
then
(seq data)
;([:state "active"] [:course_i "C0"] [:course_n "law"])
... a sequence of key-value pairs.
So if we define (following #chouser)
(defn lazy-contains? [coll x]
(some #(= x %) coll))
... then
(lazy-contains? data :state)
;nil
... a false result, whereas ...
(lazy-contains? data [:state "active"])
;true
This is what #Ankur was getting at, showing you a function treating a map as a sequence consistent with contains? on the map itself.
The standard contains? works with keyed/indexed collections - maps
or sets or vectors - where it tests for the presence of a key.
Our lazy-contains? works with anything sequable, including all the
standard collections, testing for the presence of a value.
Given the way that keyed/indexed collections present as sequences, these are bound to be inconsistent.
You can try the below implementation (for maps only):
(defn lazy-contains? [col key]
(some (fn [[k v]] (= k key)) col))
Remember, contains? is to check the existence of a key in a collection, in maps the key is obvious, in other supported collections (like vector) the key is the index.
A "lazy" implementation of contains? is undesirable where checking for presence
of a key in a hash-map or of a value in a set
(contains? #{:foo} :foo}) => true
(contains? {:foo :bar} :foo) => true
of an index of a vector array or string.
(contains? [:foo] 0) => true
(contains? (int-array 7) 6) => true
(contains? "foo" 2) => true
Quoting from the contains? docstring:
'contains?' operates constant or logarithmic time; it will not
perform a linear search for a value.
some is a tool for linear searching. When searching for an element in a set or vector, it can take the input sequence length times as long as contains? or longer in the worst case and will take more time than contains? in almost every case.
contains? can't be implemented "lazy" as it does not produce a sequence. However, some stops consuming a lazy sequence as soon as it has determined a return value.
(some zero? (range))
;; true
Notice that maps and sets are never sequential or lazy.
Fairly new to lisps, but in looking into sequential integer generating code, I noticed that repeated calls to (gensym) would increase the number provided after the prefix by 3. I'm curious why that is the case.
user=> (gensym)
G__662
user=> (gensym)
G__665
user=> (gensym)
G__668
user=> (gensym)
G__671
user=> (gensym)
G__674
user=> (gensym)
G__677
I've seen and understand the combined use of atom and inc, but I'm new to the gensym function.
There are a number of correct answers here. One is: it doesn't!
user> (take 5 (repeatedly gensym))
(G__2173 G__2174 G__2175 G__2176 G__2177)
Another is: gensym doesn't make any guarantees as to the form of the symbols it generates, so you really shouldn't care whether they're sequential or not (or even if they contain numbers at all). You certainly shouldn't hijack gensym to produce an integer sequence.
Lastly: why does it increase by three in your example? Because each time you evaluate a form in the repl, the compiler has to create some gensyms of its own. Apparently, for the form (gensym), the number it needs to create is two.
It doesn't!
=> (str (gensym) (gensym))
"G__4027G__4028"
Looking at the source of gensym we can see that it uses clojure.lang.RT/nextID.
(defn gensym
([prefix-string] (. clojure.lang.Symbol (intern (str prefix-string (str (. clojure.lang.RT (nextID))))))))
The nextID function is also used in the LispReader. So when you repeatedly evaluate (gensym), the reader is probably using two IDs.
I clearly have something else going on in my process too, as if I wait any time between evaluations, more IDs are consumed and the gensym gaps further than just 3.
https://github.com/clojure/clojure/search?q=nextid
Let's say I have a LazySeq
(def s (take 10 (iterate + 0)))
Does (count s) realize the sequence?
If you are asking about lazy sequences, Yes.
user> (def s (map #(do (println "doing work") %) (range 4)))
#'user/s
user> (count s)
doing work
doing work
doing work
doing work
4
Some of the data structures can give you answers in constant time, though lazy sequences do not have a stored count, and counting always realizes them.
For a LazySeq yes, you can see its count method here. It walks every element from head to tail.
Depends on the definition of lazy sequence. It's possible to implement ones that know their length without realizing their elements. See this question for an example, but in 99% of the cases they're just LazySeqs so Michiel's answer should cover that.
In your example case it's easy to test, as:
(realized? s)
returns true after calling (count s), so s isn't 'clever' enough to know it's length without realizing it's content.
I was under the impression that the lazy seqs were always chunked.
=> (take 1 (map #(do (print \.) %) (range)))
(................................0)
As expected 32 dots are printed because the lazy seq returned by range is chunked into 32 element chunks. However, when instead of range I try this with my own function get-rss-feeds, the lazy seq is no longer chunked:
=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")
Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds is not chunked. Indeed:
=> (chunked-seq? (seq (range)))
true
=> (chunked-seq? (seq (get-rss-feeds r)))
false
Here is the source for get-rss-feeds:
(defn get-rss-feeds
"returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
[hr]
(map #(:href (:attrs %))
(filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))
So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?
Here's why I need to know.
I have to following code: (get-rss-entry (get-rss-feeds h-res) url)
The call to get-rss-feeds returns a lazy sequence of URLs of feeds that I need to examine.
The call to get-rss-entry looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.
Here is the code:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))
entry-with-url returns a lazy sequence of matches or an empty sequence if there is no match.
I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.
Am I using lazy seq non-idiomatically? Would loop/recur be a better option?
You are right to be concerned. Your get-rss-entry will indeed call entry-with-url more than strictly necessary if the feeds parameter is a collection that returns chunked seqs. For example if feeds is a vector, map will operate on whole chunks at a time.
This problem is addressed directly in Fogus' Joy of Clojure, with the function seq1 defined in chapter 12:
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
You could use this right where you know you want the most laziness possible, right before you call entry-with-url:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) (seq1 feeds)))))
Lazy seqs are not always chunked - it depends on how they are produced.
For example, the lazy seq produced by this function is not chunked:
(defn integers-from [n]
(lazy-seq (cons n (do (print \.) (integers-from (inc n))))))
(take 3 (integers-from 3))
=> (..3 .4 5)
But many other clojure built-in functions do produce chunked seqs for performance reasons (e.g. range)
Depending on the vagueness of Chunking seems unwise as you mention above. Explicitly "un chunking" in cases where you really need it not to be chunked is also wise because then if at some other point your code changes in a way that chunkifies it things wont break. On another note, if you need actions to be sequential, agents are a great tool you could send the download functions to an agent then they will be run one at a time and only once regardless of how you evaluate the function. At some point you may want to pmap your sequence and then even un-chunking will not work though using an atom will continue to work correctly.
I have discussed this recently in Can I un-chunk lazy sequences to realize one element at a time? and the conclusion is that if you need to control when items are produced/consumed, you should not use lazy sequences.
For processing you can use transducers, where you control when the next item is processed.
For producing the elements, the ideal approach is to reify ISeq. A practical approach is to use lazy-seq with a single cons call in it whose rest is a recursive call. But notice that this relies on an implementation detail of lazy-seq.