I encountered an error while using peek function with a seq object. This was a bit surprising for me because I was expecting that peek would work with any seq object.
(def r0
(re-seq #"\w+" "foo bar"))
(identity r0)
;; ("foo" "bar")
(peek r0)
; (err) Execution error (ClassCastException) at (REPL:1).
(peek '("foo" "bar"))
;; "foo"
(= r0 '("foo" "bar"))
;; true
(type r0)
;; clojure.lang.Cons
(type '("foo"))
;; clojure.lang.PersistentList
The documentation of peek says that peek accepts a collection object. But sequences are collections too, right? But it is not supported by peek. So, is this a documentation error then? Or am I missing something else?
peek works with "persistent types implementing clojure.lang.IPersistentStack (like clojure.lang.Persistent*)", according to tomby42, 8.2 years ago in the doc notes.
Clojure's doc string for function peek in Clojure 1.10.1 (current as of this writing) is: "For a list or queue, same as first, for a vector, same as, but much
more efficient than, last. If the collection is empty, returns nil."
In some discussion of peek's documentation on Clojurians Slack, it sounds like perhaps the word "collection" in peek's doc string is referring to one of the earlier-mentioned collection types of list, queue, or vector, not arbitrary Clojure collections. That is, it should be interpreted the same as if the word "collection" was replaced with the phrase "list, queue, or vector".
Related
I'm learning Clojure and I found something surprised me, as mentioned in title. As doc said, the clojure.set/union function
Return a set that is the union of the input sets
However I tried to input other type sequences and it gives me some result instead of telling me the input type is wrong. For example
user=> (clojure.set/union '(1 2 3) '(2 3 4))
(4 3 2 1 2 3)
Here I expected Clojure to warn me my inputs are not sets, but it returns another list with duplicate inside, which is also opposite to what is stated in document("Return a set").
I'm wondering why this function is designed like this and what benefits it provides than giving a type error. Thanks in advance!
While I would prefer to see more type checking, Clojure often adopts a "garbage in, garbage out" type of philosophy. In this example, it extends to assuming that you provide 2 sets to the union function.
Looking at the source:
(defn union
"Return a set that is the union of the input sets"
[s1 s2]
(if (< (count s1) (count s2))
(reduce conj s2 s1)
(reduce conj s1 s2)))
you can see it simply appends the shorter input onto the longer input using conj. For a sequential list or vector, this adds the 2nd list of items (one at a time) to the front of the 1st list, which your example shows.
I am learning Clojure, and I saw this bit of code online:
(count (filter #{42} coll))
And it does, as stated, count occurrences of the number 42 in coll. Is #{42} a function? The Clojure documentation on filter says that it should be, since the snippet works as advertised. I just have no idea how it works. If someone could clarify this for me, that would be great. My own solution to this same thing would have been:
(count (filter #(= %1 42) coll))
How come my filtering function has parenthesis and the snippet I found online has curly braces around the filtering function (#(...) vs. #{...})?
=> #{42}
#{42}
Defines a set...
=> (type #{42})
clojure.lang.PersistentHashSet
=> (supers (type #{42}))
#{clojure.lang.IHashEq java.lang.Object clojure.lang.IFn ...}
Interestingly the set implements IFn so you can treat it like a function. The behaviour of the function is "if this item exists in the set, return it".
=> (#{2 3} 3)
3
=> (#{2 3} 4)
nil
Other collections such as map and vector stand in as functions in a similar fashion, retrieving by key or index as appropriate.
=> ({:x 23 :y 26} :y)
26
=> ([5 7 9] 1)
7
Sweet, no? :-)
Yes, #{42} is a function,
because it's a set, and sets, amongst other capabilities, are
functions: they implement the clojure.lang.IFn interface.
Applied to any value in the set, they return it; applied to anything
else, they return nil.
So #{42} tests whether its argument is 42 (only nil and false are false, remember).
The Clojure way is to make everything a function that might usefully be one:
Sets work as a test for membership.
Maps work as key lookup.
Vectors work as index lookup.
Keywords work as lookup in the map argument.
This
often saves you a get,
allows you, as in the question, to pass naked data structures to higher order functions
such as filter and map, and
in the case of keywords, allows you to move transparently between maps and records
for holding your data.
I was under the impression that the lazy seqs were always chunked.
=> (take 1 (map #(do (print \.) %) (range)))
(................................0)
As expected 32 dots are printed because the lazy seq returned by range is chunked into 32 element chunks. However, when instead of range I try this with my own function get-rss-feeds, the lazy seq is no longer chunked:
=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")
Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds is not chunked. Indeed:
=> (chunked-seq? (seq (range)))
true
=> (chunked-seq? (seq (get-rss-feeds r)))
false
Here is the source for get-rss-feeds:
(defn get-rss-feeds
"returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
[hr]
(map #(:href (:attrs %))
(filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))
So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?
Here's why I need to know.
I have to following code: (get-rss-entry (get-rss-feeds h-res) url)
The call to get-rss-feeds returns a lazy sequence of URLs of feeds that I need to examine.
The call to get-rss-entry looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.
Here is the code:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))
entry-with-url returns a lazy sequence of matches or an empty sequence if there is no match.
I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.
Am I using lazy seq non-idiomatically? Would loop/recur be a better option?
You are right to be concerned. Your get-rss-entry will indeed call entry-with-url more than strictly necessary if the feeds parameter is a collection that returns chunked seqs. For example if feeds is a vector, map will operate on whole chunks at a time.
This problem is addressed directly in Fogus' Joy of Clojure, with the function seq1 defined in chapter 12:
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
You could use this right where you know you want the most laziness possible, right before you call entry-with-url:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) (seq1 feeds)))))
Lazy seqs are not always chunked - it depends on how they are produced.
For example, the lazy seq produced by this function is not chunked:
(defn integers-from [n]
(lazy-seq (cons n (do (print \.) (integers-from (inc n))))))
(take 3 (integers-from 3))
=> (..3 .4 5)
But many other clojure built-in functions do produce chunked seqs for performance reasons (e.g. range)
Depending on the vagueness of Chunking seems unwise as you mention above. Explicitly "un chunking" in cases where you really need it not to be chunked is also wise because then if at some other point your code changes in a way that chunkifies it things wont break. On another note, if you need actions to be sequential, agents are a great tool you could send the download functions to an agent then they will be run one at a time and only once regardless of how you evaluate the function. At some point you may want to pmap your sequence and then even un-chunking will not work though using an atom will continue to work correctly.
I have discussed this recently in Can I un-chunk lazy sequences to realize one element at a time? and the conclusion is that if you need to control when items are produced/consumed, you should not use lazy sequences.
For processing you can use transducers, where you control when the next item is processed.
For producing the elements, the ideal approach is to reify ISeq. A practical approach is to use lazy-seq with a single cons call in it whose rest is a recursive call. But notice that this relies on an implementation detail of lazy-seq.
From a comment on another question, someone is saying that Clojure idiom prefers to return nil rather than an empty list like in Scheme. Why is that?
Like,
(when (seq lat) ...)
instead of
(if (empty? lat)
'() ...)
I can think of a few reasons:
Logical distinction. In Clojure nil means nothing / absence of value. Whereas '() "the empty list is a value - it just happens to be a value that is an empty list. It's quite often conceptually and logically useful to distinguish between the two.
Fit with JVM - the JVM object model supports null references. And quite a lot of Java APIs return null to mean "nothing" or "value not found". So to ensure easy JVM interoperability, it makes sense for Clojure to use nil in a similar way.
Laziness - the logic here is quite complicated, but my understanding is that using nil for "no list" works better with Clojure's lazy sequences. As Clojure is a lazy functional programming language by default, it makes sense for this usage to be standard. See http://clojure.org/lazy for some extra explanation.
"Falsiness" - It's convenient to use nil to mean "nothing" and also to mean "false" when writing conditional code that examines collections - so you can write code like (if (some-map :some-key) ....) to test if a hashmap contains a value for a given key.
Performance - It's more efficient to test for nil than to examine a list to see if it empty... hence adopting this idiom as standard can lead to higher performance idiomatic code
Note that there are still some functions in Clojure that do return an empty list. An example is rest:
(rest [1])
=> ()
This question on rest vs. next goes into some detail of why this is.....
Also note that the union of collection types and nil form a monoid, with concatenation the monoid plus and nil the monoid zero. So nil keeps the empty list semantics under concatenation while also representing a false or "missing" value.
Python is another language where common monoid identities represent false values: 0, empty list, empty tuple.
From The Joy of Clojure
Because empty collections act like true in Boolean contexts, you need an idiom for testing whether there's anything in a collection to process. Thankfully, Clojure provides such a technique:
(seq [1 2 3])
;=> (1 2 3)
(seq [])
;=> nil
In other Lisps, like Common Lisp, the empty list is used to mean nil. This is known as nil punning and is only viable when the empty list is falsey. Returning nil here is clojure's way of reintroducing nil punning.
Since I wrote the comment I will write a answer. (The answer of skuro provides all information but maybe a too much)
First of all I think that more importend things should be in first.
seq is just what everybody uses most of the time but empty? is fine to its just (not (seq lat))
In Clojure '() is true, so normaly you want to return something thats false if the sequence is finished.
if you have only one importend branch in your if an the other returnes false/'() or something like that why should you write down that branch. when has only one branch this is spezially good if you want to have sideeffects. You don't have to use do.
See this example:
(if false
'()
(do (println 1)
(println 2)
(println 3)))
you can write
(when true
(println 1)
(println 2)
(println 3))
Not that diffrent but i think its better to read.
P.S.
Not that there are functions called if-not and when-not they are often better then (if (not true) ...)
I tried the following in Clojure, expecting to have the class of a non-lazy sequence returned:
(.getClass (doall (take 3 (repeatedly rand))))
However, this still returns clojure.lang.LazySeq. My guess is that doall does evaluate the entire sequence, but returns the original sequence as it's still useful for memoization.
So what is the idiomatic means of creating a non-lazy sequence from a lazy one?
doall is all you need. Just because the seq has type LazySeq doesn't mean it has pending evaluation. Lazy seqs cache their results, so all you need to do is walk the lazy seq once (as doall does) in order to force it all, and thus render it non-lazy. seq does not force the entire collection to be evaluated.
This is to some degree a question of taxonomy. a lazy sequence is just one type of sequence as is a list, vector or map. So the answer is of course "it depends on what type of non lazy sequence you want to get:
Take your pick from:
an ex-lazy (fully evaluated) lazy sequence (doall ... )
a list for sequential access (apply list (my-lazy-seq)) OR (into () ...)
a vector for later random access (vec (my-lazy-seq))
a map or a set if you have some special purpose.
You can have whatever type of sequence most suites your needs.
This Rich guy seems to know his clojure and is absolutely right.
Buth I think this code-snippet, using your example, might be a useful complement to this question :
=> (realized? (take 3 (repeatedly rand)))
false
=> (realized? (doall (take 3 (repeatedly rand))))
true
Indeed type has not changed but realization has
I stumbled on this this blog post about doall not being recursive. For that I found the first comment in the post did the trick. Something along the lines of:
(use 'clojure.walk)
(postwalk identity nested-lazy-thing)
I found this useful in a unit test where I wanted to force evaluation of some nested applications of map to force an error condition.
(.getClass (into '() (take 3 (repeatedly rand))))