Inconsistency with Clojure's sequences?

Inconsistency with Clojure's sequences? - clojure

Clojure:
1:13 user=> (first (conj '(1 2 3) 4))
4
1:14 user=> (first (conj [1 2 3] 4))
1
; . . .
1:17 user=> (first (conj (seq [1 2 3]) 4))
4
I understand what is going on, but should this work differently?

Documentation for conj (from clojure.org):
conj[oin]. Returns a new collection with the xs
'added'. (conj nil item) returns (item). The 'addition' may
happen at different 'places' depending on the concrete type.
It's more efficient to "add" elements to the end of a vector, while it's more efficient to do so at the beginning of lists. conj uses whatever is the most efficient for the data structure you give it.
In the examples you give, '(1 2 3) and (seq [1 2 3]) both implement ISeq (see documentation for seq?), while [1 2 3] doesn't.
Clojure's conj ultimately calls the cons method (not to be confused with the cons function - this method is internal clojure code) on the underlying data structure; for vectors (PersistentVector), cons adds elements to the end, while for lists they're added to the front (the cons method for PersistentLists returns a new list with the new element as its head, and the existing list as its tail).

If you look at Clojure Data Structures
you'll see that conj works differently with lists and vectors.
conj puts the added item at the front of the list and at the end of a vector.
I also suggest looking at Clojure API conj
which has some nice examples. ClojureDocs overall has some very nice examples for most Clojure commands.

Related

Is list or vector faster for accessing two most recent entries (in Clojure)?

This is sort of a follow-up for this question.
Say we need to access the two most recently inserted elements in a collection. Would a list or a vector be faster for that operation?
Are these the fastest options?
; for list
(def l '(3 2 1))
(peek l)
=> 3
(peek (pop l))
=> 2
; for vector
(def v [1 2 3])
(peek v)
=> 3
(peek (pop v))
=> 2
Or would it be faster to do something like:
(v (- (count v) 1))
=> 3
(v (- (count v) 2))
=> 2
I would greatly appreciate an explanation as to how this situation is correctly analyzed. This is for my first game and Clojure program, which is why I'm concerned about performance. :) The reason I'm not packing the two values together is to avoid unpacking/mapping them as the whole collection has a use as well.
Thank you!

If I'd be bound to use either list or vector, I would go with vector because subvec is very fast.
(let [a (vec (take 100000 (range)))]
(time (subvec a (- (count a) 2))))
"Elapsed time: 0.045805 msecs"
=> [99998 99999]
But it is not clear what are you trying to do. Maybe there is a better data structure for your case.

Clojure's vector uses the out-of-tree tail optimization, which basically means until it fills up, the latest node in the structure sits in up in the object header instead of being put down in the tree. So you should be seeing constant time lookup for the n % 32 highest-indexed elements in a tree of n elements.
For a list in the strictest sence, getting to the second element would always involve following the chain through element 1 (two steps), but if any of the operations you do make an implicit call to seq, a cache gets introduced and it becomes somewhat less clear.

Lazy partition-by

I have a source of items and want to separately process runs of them having the same value of a key function. In Python this would look like
for key_val, part in itertools.groupby(src, key_fn):
process(key_val, part)
This solution is completely lazy, i.e. if process doesn't try to store contents of entire part, the code would run in O(1) memory.
Clojure solution
(doseq [part (partition-by key-fn src)]
(process part))
is less lazy: it realizes each part completely. The problem is, src might have very long runs of items with the same key-fn value and realizing them might lead to OOM.
I've found this discussion where it's claimed that the following function (slightly modified for naming consistency inside post) is lazy enough
(defn lazy-partition-by [key-fn coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
fv (key-fn fst)
part (lazy-seq (cons fst (take-while #(= fv (key-fn %)) (next s))))]
(cons part (lazy-partition-by key-fn (drop-while #(= fv (key-fn %)) s)))))))
However, I don't understand why it doesn't suffer from OOM: both parts of the cons cell hold a reference to s, so while process consumes part, s is being realized but not garbage collected. It would become eligible for GC only when drop-while traverses part.
So, my questions are:
Am I correct about lazy-partition-by not being lazy enough?
Is there an implementation of partition-by with guaranteed memory requirements, provided I don't hold any references to the previous part by the time I start realizing the next one?
EDIT:
Here's a lazy enough implementation in Haskell:
lazyPartitionBy :: Eq b => (a -> b) -> [a] -> [[a]]
lazyPartitionBy _ [] = []
lazyPartitionBy keyFn xl#(x:_) = let
fv = keyFn x
(part, rest) = span ((== fv) . keyFn) xl
in part : lazyPartitionBy keyFn rest
As can be seen from span implementation, part and rest implicitly share state. I wonder if this method could be translated into Clojure.

The rule of thumb that I use in these scenarios (ie, those in which you want a single input sequence to produce multiple output sequences) is that, of the following three desirable properties, you can generally have only two:
Efficiency (traverse the input sequence only once, thus not hold its head)
Laziness (produce elements only on demand)
No shared mutable state
The version in clojure.core chooses (1,3), but gives up on (2) by producing an entire partition all at once. Python and Haskell both choose (1,2), although it's not immediately obvious: doesn't Haskell have no mutable state at all? Well, its lazy evaluation of everything (not just sequences) means that all expressions are thunks, which start out as blank slates and only get written to when their value is needed; the implementation of span, as you say, shares the same thunk of span p xs' in both of its output sequences, so that whichever one needs it first "sends" it to the result of the other sequence, effecting the action at a distance that's necessary to preserve the other nice properties.
The alternative Clojure implementation you linked to chooses (2,3), as you noted.
The problem is that for partition-by, declining either (1) or (2) means that you're holding the head of some sequence: either the input or one of the outputs. So if you want a solution where it's possible to handle arbitrarily large partitions of an arbitrarily large input, you need to choose (1,2). There are a few ways you could do this in Clojure:
Take the Python approach: return something more like an iterator than a seq - seqs make stronger guarantees about non-mutation, and promise that you can safely traverse them multiple times, etc etc. If instead of a seq of seqs you return an iterator of iterators, then consuming items from any one iterator can freely mutate or invalidate the other(s). This guarantees consumption happens in order and that memory can be freed up.
Take the Haskell approach: manually thunk everything, with lots of calls to delay, and require the client to call force as often as necessary to get data out. This will be a lot uglier in Clojure, and will greatly increase your stack depth (using this on a non-trivial input will probably blow the stack), but it is theoretically possible.
Write something more Clojure-flavored (but still quite unusual) by having a few mutable data objects that are coordinated between the output seqs, each updated as needed when something is requested from any of them.
I'm pretty sure any of these three approaches are possible, but to be honest they're all pretty hard and not at all natural. Clojure's sequence abstraction just doesn't make it easy to produce a data structure that's what you'd like. My advice is that if you need something like this and the partitions may be too large to fit comfortably, you just accept a slightly different format and do a little more bookkeeping yourself: dodge the (1,2,3) dilemma by not producing multiple output sequences at all!
So instead of ((2 4 6 8) (1 3 5) (10 12) (7)) being your output format for something like (partition-by even? [2 4 6 8 1 3 5 10 12 7]), you could accept a slightly uglier format: ([::key true] 2 4 6 8 [::key false] 1 3 5 [::key true] 10 12 [::key false] 7). This is neither hard to produce nor hard to consume, although it is a bit lengthy and tedious to write out.
Here is one reasonable implementation of the producing function:
(defn lazy-partition-by [f coll]
(lazy-seq
(when (seq coll)
(let [x (first coll)
k (f x)]
(list* [::key k] x
((fn part [k xs]
(lazy-seq
(when (seq xs)
(let [x (first xs)
k' (f x)]
(if (= k k')
(cons x (part k (rest xs)))
(list* [::key k'] x (part k' (rest xs))))))))
k (rest coll)))))))
And here's how to consume it, first defining a generic reduce-grouped that hides the details of the grouping format, and then an example function count-partition-sizes to output the key and size of each partition without keeping any sequences in memory:
(defn reduce-grouped [f init groups]
(loop [k nil, acc init, coll groups]
(if (empty? coll)
acc
(if (and (coll? (first coll)) (= ::key (ffirst coll)))
(recur (second (first coll)) acc (rest coll))
(recur k (f k acc (first coll)) (rest coll))))))
(defn count-partition-sizes [f coll]
(reduce-grouped (fn [k acc _]
(if (and (seq acc) (= k (first (peek acc))))
(conj (pop acc) (update-in (peek acc) [1] inc))
(conj acc [k 1])))
[] (lazy-partition-by f coll)))
user> (lazy-partition-by even? [2 4 6 8 1 3 5 10 12 7])
([:user/key true] 2 4 6 8 [:user/key false] 1 3 5 [:user/key true] 10 12 [:user/key false] 7)
user> (count-partition-sizes even? [2 4 6 8 1 3 5 10 12 7])
[[true 4] [false 3] [true 2] [false 1]]
Edit: Looking at it again, I'm not really convinced my reduce-grouped is much more useful than (reduce f init (map g xs)), since it doesn't really give you any clear indication of when the key changes. So if you do need to know when a group changes, you'll want a smarter abstraction, or to use my original lazy-partition-by with nothing "clever" wrapping it.

Although this question evokes very interesting contemplation about language design, the practical problem is you want to process on partitions in constant memory. And the practical problem is resolvable with a little inversion.
Rather than processing over the result of a function that returns a sequence of partitions, pass the processing function into the function that produces the partitions. Then, you can control state in a contained manner.
First we'll provide a way to fuse together the consumption of the sequence with the state of the tail.
(defn fuse [coll wick]
(lazy-seq
(when-let [s (seq coll)]
(swap! wick rest)
(cons (first s) (fuse (rest s) wick)))))
Then a modified version of partition-by
(defn process-partition-by [processfn keyfn coll]
(lazy-seq
(when (seq coll)
(let [tail (atom (cons nil coll))
s (fuse coll tail)
fst (first s)
fv (keyfn fst)
pred #(= fv (keyfn %))
part (take-while pred s)
more (lazy-seq (drop-while pred #tail))]
(cons (processfn part)
(process-partition-by processfn keyfn more))))))
Note: For O(1) memory consumption processfn must be an eager consumer! So while (process-partition-by identity key-fn coll) is the same as (partition-by key-fn coll), because identity does not consume the partition, the memory consumption is not constant.
Tests
(defn heavy-seq []
;adjust payload for your JVM so only a few fit in memory
(let [payload (fn [] (long-array 20000000))]
(map #(vector % (payload)) (iterate inc 0))))
(defn my-process [s] (reduce + (map first s)))
(defn test1 []
(doseq [part (partition-by #(quot (first %) 10) (take 50 (heavy-seq)))]
(my-process part)))
(defn test2 []
(process-partition-by
my-process #(quot (first %) 20) (take 200 (heavy-seq))))
so.core=> (test1)
OutOfMemoryError Java heap space [trace missing]
so.core=> (test2)
(190 590 990 1390 1790 2190 2590 2990 3390 3790)

Am I correct about lazy-partition-by not being lazy enough?
Well, there's a difference between laziness and memory usage. A sequence can be lazy and still require lots of memory - see for instance the implementation of clojure.core/distinct, which uses a set to remember all the previously observed values in the sequence. But yes, your analysis of the memory requirements of lazy-partition-by is correct - the function call to compute the head of the second partition will retain the head of the first partition, which means that realizing the first partition causes it to be retained in-memory. This can be verified with the following code:
user> (doseq [part (lazy-partition-by :a
(repeatedly
(fn [] {:a 1 :b (long-array 10000000)})))]
(dorun part))
; => OutOfMemoryError Java heap space
Since neither doseq nor dorun retains the head, this would simply run forever if lazy-partition-by were O(1) in memory.
Is there an implementation of partition-by with guaranteed memory requirements, provided I don't hold any references to the previous part by the time I start realizing the next one?
It would be very difficult, if not impossible, to write such an implementation in a purely functional manner that would work for the general case. Consider that a general lazy-partition-by implementation cannot make any assumptions about when (or if) a partition is realized. The only guaranteed correct way of finding the start of the second partition, short of introducing some nasty bit of statefulness to keep track of how much of the first partition has been realized, is to remember where the first partition began and scan forward when requested.
For the special case where you're processing records one at a time for side effects and want them grouped by key (as is implied by your use of doseq above), you might consider something along the lines of a loop/recur which maintains a state and re-sets it when the key changes.

What's the one-level sequence flattening function in Clojure?

What's the one-level sequence flattening function in Clojure? I am using apply concat for now, but I wonder if there is a built-in function for that, either in standard library or clojure-contrib.

My general first choice is apply concat. Also, don't overlook (for [subcoll coll, item subcoll] item) -- depending on the broader context, this may result in clearer code.

There's no standard function. apply concat is a good solution in many cases. Or you can equivalently use mapcat seq.
The problem with apply concat is that it fails when there is anything other than a collection/sequential is at the first level:
(apply concat [1 [2 3] [4 [5]]])
=> IllegalArgumentException Don't know how to create ISeq from: java.lang.Long...
Hence you may want to do something like:
(defn flatten-one-level [coll]
(mapcat #(if (sequential? %) % [%]) coll))
(flatten-one-level [1 [2 3] [4 [5]]])
=> (1 2 3 4 [5])
As a more general point, the lack of a built-in function should not usually stop you from defining your own :-)

i use apply concat too - i don't think there's anything else in the core.
flatten is multiple levels (and is defined via a tree-walk, not in terms of repeated single level expansion)
see also Clojure: Semi-Flattening a nested Sequence which has a flatten-1 from clojure mvc (and which is much more complex than i expected).
update to clarify laziness:
user=> (take 3 (apply concat (for [i (range 1e6)] (do (print i) [i]))))
012345678910111213141516171819202122232425262728293031(0 1 2)
you can see that it evaluates the argument 32 times - this is chunking for efficiency, and is otherwise lazy (it doesn't evaluate the whole list). for a discussion of chunking see comments at end of http://isti.bitbucket.org/2012/04/01/pipes-clojure-choco-1.html

How to break out from nested doseqs

I have a question regarding nested doseq loops. In the start function, once I find an answer I set the atom to true, so that the outer loop validation with :while fails. However it seems that it doesn't break it, and the loops keep on going. What's wrong with it?
I am also quite confused with the usage of atoms, refs, agents (Why do they have different names for the update functions when then the mechanism is almost the same?) etc.
Is it okay to use an atom in this situation as a flag? Obviously I need a a variable like object to store a state.
(def pentagonal-list (map (fn [a] (/ (* a (dec (* 3 a))) 2)) (iterate inc 1)))
(def found (atom false))
(defn pentagonal? [a]
(let [y (/ (inc (Math/sqrt (inc (* 24 a)))) 6)
x (mod (* 10 y) 10)]
(if (zero? x)
true
false)))
(defn both-pent? [a b]
(let [sum (+ b a)
diff (- a b)]
(if (and (pentagonal? sum) (pentagonal? diff))
true
false)))
(defn start []
(doseq [x pentagonal-list :while (false? #found)]
(doseq [y pentagonal-list :while (<= y x)]
(if (both-pent? x y)
(do
(reset! found true)
(println (- x y)))))))

Even once the atom is set to true, your version can't stop running until the inner doseq finishes (until y > x). It will terminate the outer loop once the inner loop finishes though. It does terminate eventually when I run it. Not sure what you're seeing.
You don't need two doseqs to do this. One doseq can handle two seqs at once.
user> (doseq [x (range 0 2) y (range 3 6)] (prn [x y]))
[0 3]
[0 4]
[0 5]
[1 3]
[1 4]
[1 5]
(The same is true of for.) There is no mechanism for "breaking out" of nested doseqs that I know of, except throw/catch, but that's rather un-idiomatic. You don't need atoms or doseq for this at all though.
(def answers (filter (fn [[x y]] (both-pent? x y))
(for [x pentagonal-list
y pentagonal-list :while (<= y x)]
[x y])))
Your code is very imperative in style. "Loop over these lists, then test the values, then print something, then stop looping." Using atoms for control like this is not very idiomatic in Clojure.
A more functional way is to take a seq (pentagonal-list) and wrap it in functions that turn it into other seqs until you get a seq that gives you what you want. First I use for to turn two copies of this seqs into one seq of pairs where y <= x. Then I use filter to turn that seq into one that filters out the values we don't care about.
filter and for are lazy, so this will stop running once it finds the first valid value, if one is all you want. This returns the two numbers you want, and then you can subtract them.
(apply - (first answers))
Or you can further wrap the function in another map to calculate the differences for you.
(def answers2 (map #(apply - %) answers))
(first answers2)
There are some advantages to programming functionally in this way. The seq is cached (if you hold onto the head like I do here), so once a value is calculated, it remembers it and you can access it instantly from then on. Your version can't be run again without resetting the atom, and then would have to re-calculate everything. With my version you can (take 5 answers) to get the first 5 results, or map over the result to do other things if you want. You can doseq over it and print the values. Etc. etc.
I'm sure there are other (maybe better) ways to do this still without using an atom. You should usually avoid mutating references unless it's 100% necessary in Clojure.
The function names for altering atoms/agents/refs are different probably because the mechanics are not the same. Refs are synchronous and coordinated via transactions. Agents are asynchronous. Atoms are synchronous and uncoordinated. They all kind of "change a reference", and probably some kind of super-function or macro could wrap them all in one name, but that would obscure the fact that they're doing drastically different things under the hood. Explaining the differences fully is probably beyond the scope of an SO post to explain, but http://clojure.org fully explains all of the nuances of the differences.

Common programming mistakes for Clojure developers to avoid [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What are some common mistakes made by Clojure developers, and how can we avoid them?
For example; newcomers to Clojure think that the contains? function works the same as java.util.Collection#contains. However, contains? will only work similarly when used with indexed collections like maps and sets and you're looking for a given key:
(contains? {:a 1 :b 2} :b)
;=> true
(contains? {:a 1 :b 2} 2)
;=> false
(contains? #{:a 1 :b 2} :b)
;=> true
When used with numerically indexed collections (vectors, arrays) contains? only checks that the given element is within the valid range of indexes (zero-based):
(contains? [1 2 3 4] 4)
;=> false
(contains? [1 2 3 4] 0)
;=> true
If given a list, contains? will never return true.

Literal Octals
At one point I was reading in a matrix which used leading zeros to maintain proper rows and columns. Mathematically this is correct, since leading zero obviously don't alter the underlying value. Attempts to define a var with this matrix, however, would fail mysteriously with:
java.lang.NumberFormatException: Invalid number: 08
which totally baffled me. The reason is that Clojure treats literal integer values with leading zeros as octals, and there is no number 08 in octal.
I should also mention that Clojure supports traditional Java hexadecimal values via the 0x prefix. You can also use any base between 2 and 36 by using the "base+r+value" notation, such as 2r101010 or 36r16 which are 42 base ten.
Trying to return literals in an anonymous function literal
This works:
user> (defn foo [key val]
{key val})
#'user/foo
user> (foo :a 1)
{:a 1}
so I believed this would also work:
(#({%1 %2}) :a 1)
but it fails with:
java.lang.IllegalArgumentException: Wrong number of args passed to: PersistentArrayMap
because the #() reader macro gets expanded to
(fn [%1 %2] ({%1 %2}))
with the map literal wrapped in parenthesis. Since it's the first element, it's treated as a function (which a literal map actually is), but no required arguments (such as a key) are provided. In summary, the anonymous function literal does not expand to
(fn [%1 %2] {%1 %2}) ; notice the lack of parenthesis
and so you can't have any literal value ([], :a, 4, %) as the body of the anonymous function.
Two solutions have been given in the comments. Brian Carper suggests using sequence implementation constructors (array-map, hash-set, vector) like so:
(#(array-map %1 %2) :a 1)
while Dan shows that you can use the identity function to unwrap the outer parenthesis:
(#(identity {%1 %2}) :a 1)
Brian's suggestion actually brings me to my next mistake...
Thinking that hash-map or array-map determine the unchanging concrete map implementation
Consider the following:
user> (class (hash-map))
clojure.lang.PersistentArrayMap
user> (class (hash-map :a 1))
clojure.lang.PersistentHashMap
user> (class (assoc (apply array-map (range 2000)) :a :1))
clojure.lang.PersistentHashMap
While you generally won't have to worry about the concrete implementation of a Clojure map, you should know that functions which grow a map - like assoc or conj - can take a PersistentArrayMap and return a PersistentHashMap, which performs faster for larger maps.
Using a function as the recursion point rather than a loop to provide initial bindings
When I started out, I wrote a lot of functions like this:
; Project Euler #3
(defn p3
([] (p3 775147 600851475143 3))
([i n times]
(if (and (divides? i n) (fast-prime? i times)) i
(recur (dec i) n times))))
When in fact loop would have been more concise and idiomatic for this particular function:
; Elapsed time: 387 msecs
(defn p3 [] {:post [(= % 6857)]}
(loop [i 775147 n 600851475143 times 3]
(if (and (divides? i n) (fast-prime? i times)) i
(recur (dec i) n times))))
Notice that I replaced the empty argument, "default constructor" function body (p3 775147 600851475143 3) with a loop + initial binding. The recur now rebinds the loop bindings (instead of the fn parameters) and jumps back to the recursion point (loop, instead of fn).
Referencing "phantom" vars
I'm speaking about the type of var you might define using the REPL - during your exploratory programming - then unknowingly reference in your source. Everything works fine until you reload the namespace (perhaps by closing your editor) and later discover a bunch of unbound symbols referenced throughout your code. This also happens frequently when you're refactoring, moving a var from one namespace to another.
Treating the for list comprehension like an imperative for loop
Essentially you're creating a lazy list based on existing lists rather than simply performing a controlled loop. Clojure's doseq is actually more analogous to imperative foreach looping constructs.
One example of how they're different is the ability to filter which elements they iterate over using arbitrary predicates:
user> (for [n '(1 2 3 4) :when (even? n)] n)
(2 4)
user> (for [n '(4 3 2 1) :while (even? n)] n)
(4)
Another way they're different is that they can operate on infinite lazy sequences:
user> (take 5 (for [x (iterate inc 0) :when (> (* x x) 3)] (* 2 x)))
(4 6 8 10 12)
They also can handle more than one binding expression, iterating over the rightmost expression first and working its way left:
user> (for [x '(1 2 3) y '(\a \b \c)] (str x y))
("1a" "1b" "1c" "2a" "2b" "2c" "3a" "3b" "3c")
There's also no break or continue to exit prematurely.
Overuse of structs
I come from an OOPish background so when I started Clojure my brain was still thinking in terms of objects. I found myself modeling everything as a struct because its grouping of "members", however loose, made me feel comfortable. In reality, structs should mostly be considered an optimization; Clojure will share the keys and some lookup information to conserve memory. You can further optimize them by defining accessors to speed up the key lookup process.
Overall you don't gain anything from using a struct over a map except for performance, so the added complexity might not be worth it.
Using unsugared BigDecimal constructors
I needed a lot of BigDecimals and was writing ugly code like this:
(let [foo (BigDecimal. "1") bar (BigDecimal. "42.42") baz (BigDecimal. "24.24")]
when in fact Clojure supports BigDecimal literals by appending M to the number:
(= (BigDecimal. "42.42") 42.42M) ; true
Using the sugared version cuts out a lot of the bloat. In the comments, twils mentioned that you can also use the bigdec and bigint functions to be more explicit, yet remain concise.
Using the Java package naming conversions for namespaces
This isn't actually a mistake per se, but rather something that goes against the idiomatic structure and naming of a typical Clojure project. My first substantial Clojure project had namespace declarations - and corresponding folder structures - like this:
(ns com.14clouds.myapp.repository)
which bloated up my fully-qualified function references:
(com.14clouds.myapp.repository/load-by-name "foo")
To complicate things even more, I used a standard Maven directory structure:
|-- src/
| |-- main/
| | |-- java/
| | |-- clojure/
| | |-- resources/
| |-- test/
...
which is more complex than the "standard" Clojure structure of:
|-- src/
|-- test/
|-- resources/
which is the default of Leiningen projects and Clojure itself.
Maps utilize Java's equals() rather than Clojure's = for key matching
Originally reported by chouser on IRC, this usage of Java's equals() leads to some unintuitive results:
user> (= (int 1) (long 1))
true
user> ({(int 1) :found} (int 1) :not-found)
:found
user> ({(int 1) :found} (long 1) :not-found)
:not-found
Since both Integer and Long instances of 1 are printed the same by default, it can be difficult to detect why your map isn't returning any values. This is especially true when you pass your key through a function which, perhaps unbeknownst to you, returns a long.
It should be noted that using Java's equals() instead of Clojure's = is essential for maps to conform to the java.util.Map interface.
I'm using Programming Clojure by Stuart Halloway, Practical Clojure by Luke VanderHart, and the help of countless Clojure hackers on IRC and the mailing list to help along my answers.

Forgetting to force evaluation of lazy seqs
Lazy seqs aren't evaluated unless you ask them to be evaluated. You might expect this to print something, but it doesn't.
user=> (defn foo [] (map println [:foo :bar]) nil)
#'user/foo
user=> (foo)
nil
The map is never evaluated, it's silently discarded, because it's lazy. You have to use one of doseq, dorun, doall etc. to force evaluation of lazy sequences for side-effects.
user=> (defn foo [] (doseq [x [:foo :bar]] (println x)) nil)
#'user/foo
user=> (foo)
:foo
:bar
nil
user=> (defn foo [] (dorun (map println [:foo :bar])) nil)
#'user/foo
user=> (foo)
:foo
:bar
nil
Using a bare map at the REPL kind of looks like it works, but it only works because the REPL forces evaluation of lazy seqs itself. This can make the bug even harder to notice, because your code works at the REPL and doesn't work from a source file or inside a function.
user=> (map println [:foo :bar])
(:foo
:bar
nil nil)

I'm a Clojure noob. More advanced users may have more interesting problems.
trying to print infinite lazy sequences.
I knew what I was doing with my lazy sequences, but for debugging purposes I inserted some print/prn/pr calls, temporarily having forgotten what it is I was printing. Funny, why's my PC all hung up?
trying to program Clojure imperatively.
There is some temptation to create a whole lot of refs or atoms and write code that constantly mucks with their state. This can be done, but it's not a good fit. It may also have poor performance, and rarely benefit from multiple cores.
trying to program Clojure 100% functionally.
A flip side to this: Some algorithms really do want a bit of mutable state. Religiously avoiding mutable state at all costs may result in slow or awkward algorithms. It takes judgement and a bit of experience to make the decision.
trying to do too much in Java.
Because it's so easy to reach out to Java, it's sometimes tempting to use Clojure as a scripting language wrapper around Java. Certainly you'll need to do exactly this when using Java library functionality, but there's little sense in (e.g.) maintaining data structures in Java, or using Java data types such as collections for which there are good equivalents in Clojure.

Lots of things already mentioned. I'll just add one more.
Clojure if treats Java Boolean objects always as true even if it's value is false. So if you have a java land function that returns a java Boolean value, make sure you do not check it directly
(if java-bool "Yes" "No")
but rather
(if (boolean java-bool) "Yes" "No").
I got burned by this with clojure.contrib.sql library that returns database boolean fields as java Boolean objects.

Keeping your head in loops.
You risk running out of memory if you loop over the elements of a potentially very large, or infinite, lazy sequence while keeping a reference to the first element.
Forgetting there's no TCO.
Regular tail-calls consume stack space, and they will overflow if you're not careful. Clojure has 'recur and 'trampoline to handle many of the cases where optimized tail-calls would be used in other languages, but these techniques have to be intentionally applied.
Not-quite-lazy sequences.
You may build a lazy sequence with 'lazy-seq or 'lazy-cons (or by building upon higher level lazy APIs), but if you wrap it in 'vec or pass it through some other function that realizes the sequence, then it will no longer be lazy. Both the stack and the heap can be overflown by this.
Putting mutable things in refs.
You can technically do it, but only the object reference in the ref itself is governed by the STM - not the referred object and its fields (unless they are immutable and point to other refs). So whenever possible, prefer to only immutable objects in refs. Same thing goes for atoms.

using loop ... recur to process sequences when map will do.
(defn work [data]
(do-stuff (first data))
(recur (rest data)))
vs.
(map do-stuff data)
The map function (in the latest branch) uses chunked sequences and many other optimizations. Also, because this function is frequently run, the Hotspot JIT usually has it optimized and ready to go with out any "warm up time".

Collection types have different behaviors for some operations:
user=> (conj '(1 2 3) 4)
(4 1 2 3) ;; new element at the front
user=> (conj [1 2 3] 4)
[1 2 3 4] ;; new element at the back
user=> (into '(3 4) (list 5 6 7))
(7 6 5 3 4)
user=> (into [3 4] (list 5 6 7))
[3 4 5 6 7]
Working with strings can be confusing (I still don't quite get them). Specifically, strings are not the same as sequences of characters, even though sequence functions work on them:
user=> (filter #(> (int %) 96) "abcdABCDefghEFGH")
(\a \b \c \d \e \f \g \h)
To get a string back out, you'd need to do:
user=> (apply str (filter #(> (int %) 96) "abcdABCDefghEFGH"))
"abcdefgh"

too many parantheses, especially with void java method call inside which results in NPE:
public void foo() {}
((.foo))
results in NPE from outer parantheses because inner parantheses evaluate to nil.
public int bar() { return 5; }
((.bar))
results in the easier to debug:
java.lang.Integer cannot be cast to clojure.lang.IFn
[Thrown class java.lang.ClassCastException]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Inconsistency with Clojure's sequences? - clojure

Clojure: 1:13 user=> (first (conj '(1 2 3) 4)) 4 1:14 user=> (first (conj [1 2 3] 4)) 1 ; . . . 1:17 user=> (first (conj (seq [1 2 3]) 4)) 4 I understand what is going on, but should this work differently?

Related

Is list or vector faster for accessing two most recent entries (in Clojure)?

Lazy partition-by

What's the one-level sequence flattening function in Clojure?

How to break out from nested doseqs

Common programming mistakes for Clojure developers to avoid [closed]

Categories

Resources