Thanks a lot for all the beautiful answers! Cannot mark just one as correct
Note: Already a wiki
I am new to functional programming and while I can read simple functions in Functional programming, for e.g. computing the factorial of a number, I am finding it hard to read big functions.
Part of the reason is I think because of my inability to figure out the smaller blocks of code within a function definition and also partly because it is becoming difficult for me to match ( ) in code.
It would be great if someone could walk me through reading some code and give me some tips on how to quickly decipher some code.
Note: I can understand this code if I stare at it for 10 minutes, but I doubt if this same code had been written in Java, it would take me 10 minutes. So, I think to feel comfortable in Lisp style code, I must do it faster
Note: I know this is a subjective question. And I am not seeking any provably correct answer here. Just comments on how you go about reading this code, would be welcome and highly helpful
(defn concat
([] (lazy-seq nil))
([x] (lazy-seq x))
([x y]
(lazy-seq
(let [s (seq x)]
(if s
(if (chunked-seq? s)
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
(cons (first s) (concat (rest s) y)))
y))))
([x y & zs]
(let [cat (fn cat [xys zs]
(lazy-seq
(let [xys (seq xys)]
(if xys
(if (chunked-seq? xys)
(chunk-cons (chunk-first xys)
(cat (chunk-rest xys) zs))
(cons (first xys) (cat (rest xys) zs)))
(when zs
(cat (first zs) (next zs)))))))]
(cat (concat x y) zs))))
I think concat is a bad example to try to understand. It's a core function and it's more low-level than code you would normally write yourself, because it strives to be efficient.
Another thing to keep in mind is that Clojure code is extremely dense compared to Java code. A little Clojure code does a lot of work. The same code in Java would not be 23 lines. It would likely be multiple classes and interfaces, a great many methods, lots of local temporary throw-away variables and awkward looping constructs and generally all kinds of boilerplate.
Some general tips though...
Try to ignore the parens most of the time. Use the indentation instead (as Nathan Sanders suggests). e.g.
(if s
(if (chunked-seq? s)
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
(cons (first s) (concat (rest s) y)))
y))))
When I look at that my brain sees:
if foo
then if bar
then baz
else quux
else blarf
If you put your cursor on a paren and your text editor doesn't syntax-highlight the matching one, I suggest you find a new editor.
Sometimes it helps to read code inside-out. Clojure code tends to be deeply nested.
(let [xs (range 10)]
(reverse (map #(/ % 17) (filter (complement even?) xs))))
Bad: "So we start with numbers from 1 to 10. Then we're reversing the order of the mapping of the filtering of the complement of the wait I forgot what I'm talking about."
Good: "OK, so we're taking some xs. (complement even?) means the opposite of even, so "odd". So we're filtering some collection so only the odd numbers are left. Then we're dividing them all by 17. Then we're reversing the order of them. And the xs in question are 1 to 10, gotcha."
Sometimes it helps to do this explicitly. Take the intermediate results, throw them in a let and give them a name so you understand. The REPL is made for playing around like this. Execute the intermediate results and see what each step gives you.
(let [xs (range 10)
odd? (complement even?)
odd-xs (filter odd? xs)
odd-xs-over-17 (map #(/ % 17) odd-xs)
reversed-xs (reverse odd-xs-over-17)]
reversed-xs)
Soon you will be able to do this sort of thing mentally without effort.
Make liberal use of (doc). The usefulness of having documentation available right at the REPL can't be overstated. If you use clojure.contrib.repl-utils and have your .clj files on the classpath, you can do (source some-function) and see all the source code for it. You can do (show some-java-class) and see a description of all the methods in it. And so on.
Being able to read something quickly only comes with experience. Lisp is no harder to read than any other language. It just so happens that most languages look like C, and most programmers spend most of their time reading that, so it seems like C syntax is easier to read. Practice practice practice.
Lisp code, in particular, is even harder to read than other functional languages because of the regular syntax. Wojciech gives a good answer for improving your semantic understanding. Here is some help on syntax.
First, when reading code, don't worry about parentheses. Worry about indentation. The general rule is that things at the same indent level are related. So:
(if (chunked-seq? s)
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
(cons (first s) (concat (rest s) y)))
Second, if you can't fit everything on one line, indent the next line a small amount. This is almost always two spaces:
(defn concat
([] (lazy-seq nil)) ; these two fit
([x] (lazy-seq x)) ; so no wrapping
([x y] ; but here
(lazy-seq ; (lazy-seq indents two spaces
(let [s (seq x)] ; as does (let [s (seq x)]
Third, if multiple arguments to a function can't fit on a single line, line up the second, third, etc arguments underneath the first's starting parenthesis. Many macros have a similar rule with variations to allow the important parts to appear first.
; fits on one line
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
; has to wrap: line up (cat ...) underneath first ( of (chunk-first xys)
(chunk-cons (chunk-first xys)
(cat (chunk-rest xys) zs))
; if you write a C-for macro, put the first three arguments on one line
; then the rest indented two spaces
(c-for (i 0) (< i 100) (add1 i)
(side-effects!)
(side-effects!)
(get-your (side-effects!) here))
These rules help you find blocks within the code: if you see
(chunk-cons (chunk-first s)
Don't count parentheses! Check the next line:
(chunk-cons (chunk-first s)
(concat (chunk-rest s) y))
You know that the first line is not a complete expression because the next line is indented beneath it.
If you see the defn concat from above, you know you have three blocks, because there are three things on the same level. But everything below the third line is indented beneath it, so the rest belongs to that third block.
Here is a style guide for Scheme. I don't know Clojure, but most of the rules should be the same since none of the other Lisps vary much.
First remember that functional program consists of expressions, not statements. For example, form (if condition expr1 expr2) takes its 1st arg as a condition to test for the boolean falue, evaluates it, and if it eval'ed to true then it evaluates and returns expr1, otherwise evaluates and returns expr2. When every form returns an expression some of usual syntax constructs like THEN or ELSE keywords may just disappear. Note that here if itself evaluates to an expression as well.
Now about the evaluation: In Clojure (and other Lisps) most forms you encounter are function calls of the form (f a1 a2 ...), where all arguments to f are evaluated before actual function call; but forms can be also macros or special forms which don't evaluate some (or all) of its arguments. If in doubt, consult the documentation (doc f) or just check in REPL:
user=> apply
#<core$apply__3243 clojure.core$apply__3243#19bb5c09> a function
user=> doseq
java.lang.Exception: Can't take value of a macro: #'clojure.core/doseq a macro.
These two rules:
we have expressions, not statements
evaluation of a subform may occur or not, depending of how outer form behaves
should ease your groking of Lisp programs, esp. if they have nice indentation like the example you gave.
Hope this helps.
Related
I find myself writing a lot of clojure in this manner:
(defn my-fun [input]
(let [result1 (some-complicated-procedure input)
result2 (some-other-procedure result1)]
(do-something-with-results result1 result2)))
This let statement seems very... imperative. Which I don't like. In principal, I could be writing the same function like this:
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
The problem with this is that it involves recomputation of some-complicated-procedure, which may be arbitrarily expensive. Also you can imagine that some-complicated-procedure is actually a series of nested function calls, and then I either have to write a whole new function, or risk that changes in the first invocation don't get applied to the second:
E.g. this works, but I have to have an extra shallow, top-level function that makes it hard to do a mental stack trace:
(defn some-complicated-procedure [input] (lots (of (nested (operations input)))))
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
E.g. this is dangerous because refactoring is hard:
(defn my-fun [input]
(do-something-with-results (lots (of (nested (operations (mistake input))))) ; oops made a change here that wasn't applied to the other nested calls
(some-other-procedure (lots (of (nested (operations input))))))))
Given these tradeoffs, I feel like I don't have any alternatives to writing long, imperative let statements, but when I do, I cant shake the feeling that I'm not writing idiomatic clojure. Is there a way I can address the computation and code cleanliness problems raised above and write idiomatic clojure? Are imperitive-ish let statements idiomatic?
The kind of let statements you describe might remind you of imperative code, but there is nothing imperative about them. Haskell has similar statements for binding names to values within bodies, too.
If your situation really needs a bigger hammer, there are some bigger hammers that you can either use or take for inspiration. The following two libraries offer some kind of binding form (akin to let) with a localized memoization of results, so as to perform only the necessary steps and reuse their results if needed again: Plumatic Plumbing, specifically the Graph part; and Zach Tellman's Manifold, whose let-flow form furthermore orchestrates asynchronous steps to wait for the necessary inputs to become available, and to run in parallel when possible. Even if you decide to maintain your present course, their docs make good reading, and the code of Manifold itself is educational.
I recently had this same question when I looked at this code I wrote
(let [user-symbols (map :symbol states)
duplicates (for [[id freq] (frequencies user-symbols) :when (> freq 1)] id)]
(do-something-with duplicates))
You'll note that map and for are lazy and will not be executed until do-something-with is executed. It's also possible that not all (or even not any) of the states will be mapped or the frequencies calculated. It depends on what do-something-with actually requests of the sequence returned by for. This is very much functional and idiomatic functional programming.
i guess the simplest approach to keep it functional would be to have a pass-through state to accumulate the intermediate results. something like this:
(defn with-state [res-key f state]
(assoc state res-key (f state)))
user> (with-state :res (comp inc :init) {:init 10})
;;=> {:init 10, :res 11}
so you can move on to something like this:
(->> {:init 100}
(with-state :inc'd (comp inc :init))
(with-state :inc-doubled (comp (partial * 2) :inc'd))
(with-state :inc-doubled-squared (comp #(* % %) :inc-doubled))
(with-state :summarized (fn [st] (apply + (vals st)))))
;;=> {:init 100,
;; :inc'd 101,
;; :inc-doubled 202,
;; :inc-doubled-squared 40804,
;; :summarized 41207}
The let form is a perfectly functional construct and can be seen as syntactic sugar for calls to anonymous functions. We can easily write a recursive macro to implement our own version of let:
(defmacro my-let [bindings body]
(if (empty? bindings)
body
`((fn [~(first bindings)]
(my-let ~(rest (rest bindings)) ~body))
~(second bindings))))
Here is an example of calling it:
(my-let [a 3
b (+ a 1)]
(* a b))
;; => 12
And here is a macroexpand-all called on the above expression, that reveal how we implement my-let using anonymous functions:
(clojure.walk/macroexpand-all '(my-let [a 3
b (+ a 1)]
(* a b)))
;; => ((fn* ([a] ((fn* ([b] (* a b))) (+ a 1)))) 3)
Note that the expansion doesn't rely on let and that the bound symbols become parameter names in the anonymous functions.
As others write, let is actually perfectly functional, but at times it can feel imperative. It's better to become fully comfortable with it.
You might, however, want to kick the tires of my little library tl;dr that lets you write code like for example
(compute
(+ a b c)
where
a (f b)
c (+ 100 b))
Should cons be inside (lazy-seq ...)
(def lseq-in (lazy-seq (cons 1 (more-one))))
or out?
(def lseq-out (cons 1 (lazy-seq (more-one))))
I noticed
(realized? lseq-in)
;;; ⇒ false
(realized? lseq-out)
;;; ⇒ <err>
;;; ClassCastException clojure.lang.Cons cannot be cast to clojure.lang.IPending clojure.core/realized? (core.clj:6773)
All the examples on the clojuredocs.org use "out".
What are the tradeoffs involved?
You definitely want (lazy-seq (cons ...)) as your default, deviating only if you have a clear reason for it. clojuredocs.org is fine, but the examples are all community-provided and I would not call them "the docs". Of course, a consequence of how it's built is that the examples tend to get written by people who just learned how to use the construct in question and want to help out, so many of them are poor. I would refer instead to the code in clojure.core, or other known-good code.
Why should this be the default? Consider these two implementations of map:
(defn map1 [f coll]
(when-let [s (seq coll)]
(cons (f (first s))
(lazy-seq (map1 f (rest coll))))))
(defn map2 [f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s))
(map2 f (rest coll))))))
If you call (map1 prn xs), then an element of xs will be realized and printed immediately, even if you never intentionally realize an element of the resulting mapped sequence. map2, on the other hand, immediately returns a lazy sequence, delaying all its work until an element is requested.
With cons inside lazy-seq, the evaluation of the expression for the first element of your seq gets deferred; with cons on the outside, it's done right away and only the construction of the "rest" part of the seq is deferred. (So (rest lseq-out) will be a lazy seq.)
Thus, if computing the first element is expensive and it might not be needed at all, putting cons inside lazy-seq makes more sense. If the initial element is supplied to the lazy seq producer as an argument, it may make more sense to use cons on the outside (this is the case with clojure.core/iterate). Otherwise it doesn't make that much of a difference. (The overhead of creating a lazy seq object at the start is negligible.)
Clojure itself uses both approaches (although in the majority of cases lazy-seq wraps the whole seq-producing expression, which may not necessarily start with cons).
I find I very rarely use let in Clojure. For some reason I took a dislike to it when I started learning and have avoided using it ever since. It feels like the flow has stopped when let comes along. I was wondering, do you think we could do without it altogether ?
You can replace any occurrence of (let [a1 b1 a2 b2...] ...) by ((fn [a1 a2 ...] ...) b1 b2 ...) so yes, we could. I am using let a lot though, and I'd rather not do without it.
Let offers a few benefits. First, it allows value binding in a functional context. Second, it confers readability benefits. So while technically, one could do away with it (in the sense that you could still program without it), the language would be impoverished without a valuable tool.
One of the nice things about let is that it helps formalize a common (mathematical) way of specifying a computation, in which you introduce convenient bindings and then a simplified formula as a result. It's clear the bindings only apply to that "scope" and it's tie in with a more mathematical formulation is useful, especially for more functional programmers.
It's not a coincidence that let blocks occur in other languages like Haskell.
Let is indispensable to me in preventing multiple execution in macros:
(defmacro print-and-run [s-exp]
`(do (println "running " (quote ~s-exp) "produced " ~s-exp)
s-exp))
would run s-exp twice, which is not what we want:
(defmacro print-and-run [s-exp]
`(let [result# s-exp]
(do (println "running " (quote ~s-exp) "produced " result#)
result#))
fixes this by binding the result of the expression to a name and referring to that result twice.
because the macro is returning an expression that will become part of another expression (macros are function that produce s-expressions) they need to produce local bindings to prevent multiple execution and avoid symbol capture.
I think I understand your question. Correct me if it's wrong. Some times "let" is used for imperative programming style. For example,
... (let [x (...)
y (...x...)
z (...x...y...)
....x...y...z...] ...
This pattern comes from imperative languages:
... { x = ...;
y = ...x...;
...x...y...;} ...
You avoid this style and that's why you also avoid "let", don't you?
In some problems imperative style reduces amount of code. Furthermore, some times It's more efficient to write in java or c.
Also in some cases "let" just holds values of subexpressions regardless of evaluation order. For example,
(... (let [a (...)
b (...)...]
(...a...b...a...b...) ;; still fp style
There are at least two important use cases for let-bindings:
First, using let properly can make your code clearer and shorter. If you have an expression that you use more than once, binding it in a let is very nice. Here's a portion of the standard function map that uses let:
...
(let [s1 (seq c1) s2 (seq c2)]
(when (and s1 s2)
(cons (f (first s1) (first s2))
(map f (rest s1) (rest s2)))))))
...
Even if you use an expression only once, it can still be helpful (to future readers of the code) to give it a semantically meaningful name.
Second, as Arthur mentioned, if you want to use the value of an expression more than once, but only want it evaluated once, you can't simply type out the entire expression twice: you need some kind of binding. This would be merely wasteful if you have a pure expression:
user=> (* (+ 3 2) (+ 3 2))
25
but actually changes the meaning of the program if the expression has side-effects:
user=> (* (+ 3 (do (println "hi") 2))
(+ 3 (do (println "hi") 2)))
hi
hi
25
user=> (let [x (+ 3 (do (println "hi") 2))]
(* x x))
hi
25
Stumbled upon this recently so ran some timings:
(testing "Repeat vs Let vs Fn"
(let [start (System/currentTimeMillis)]
(dotimes [x 1000000]
(* (+ 3 2) (+ 3 2)))
(prn (- (System/currentTimeMillis) start)))
(let [start (System/currentTimeMillis)
n (+ 3 2)]
(dotimes [x 1000000]
(* n n))
(prn (- (System/currentTimeMillis) start)))
(let [start (System/currentTimeMillis)]
(dotimes [x 1000000]
((fn [x] (* x x)) (+ 3 2)))
(prn (- (System/currentTimeMillis) start)))))
Output
Testing Repeat vs Let vs Fn
116
18
60
'let' wins over 'pure' functional.
All examples are taken from the SICP Book: http://sicpinclojure.com/?q=sicp/1-3-3-procedures-general-methods
This was motivated from the MIT video series on LISP - http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-001-structure-and-interpretation-of-computer-programs-spring-2005/video-lectures/2a-higher-order-procedures/
In scheme, you can put 'define' inside another 'define':
(define (close-enough? v1 v2)
(define tolerance 0.00001)
(< (abs (- v1 v2)) tolerance ) )
In clojure, there is the 'let' statement with the only difference that it is nested:
(defn close-enough? [v1 v2]
(let [tolerance 0.00001]
(< (Math/abs (- v1 v2) )
tolerance) ) )
But what about rewriting in clojure something bigger like this?:
(define (sqrt x)
(define (fixed-point f first-guess)
(define (close-enough? v1 v2)
(define tolerance 0.00001)
(< (abs (- v1 v2)) tolerance))
(define (try guess)
(let ((next (f guess)))
(if (close-enough? guess next)
next
(try next))))
(try first-guess))
(fixed-point (lambda (y) (average y (/ x y)))
1.0))
This does in fact work but looks very unconventional...
(defn sqrt [n]
(let [precision 10e-6
abs #(if (< % 0) (- %) %)
close-enough? #(-> (- %1 %2) abs (< precision))
averaged-func #(/ (+ (/ n %) %) 2)
fixed-point (fn [f start]
(loop [old start
new (f start)]
(if (close-enough? old new)
new
(recur new (f new) ) ) ) )]
(fixed-point averaged-func 1) ) )
(sqrt 10)
UPDATED Mar/8/2012
Thanks for the answer!
Essentially 'letfn' is not too different from 'let' - the functions being called have to be nested in the 'letfn' definition (as opposed to Scheme where the functions are used in the next sexp after its definitions and only existing within the scope of the top-level function in which it is defined).
So another question... Why doesn't clojure give the capability of doing what scheme does? Is it some sort of language design decision? What I like about the scheme organization is:
1) The encapsulation of ideas so that I as the programmer have an idea as to what little blocks are being utilized bigger block - especially if I am only using the little blocks once within the big block (for whatever reason, even if the little blocks are useful in their own right).
2) This also stops polluting the namespace with little procedures that are not useful to the end user (I've written clojure programs, came back to them a week later and had to re-learn my code because it was in a flat structure and I felt that I was looking at the code inside out as opposed to in a top down manner).
3) A common method definition interface so I can pull out a particular sub-method, de-indent it test it, and paste the changed version back without too much fiddling around.
Why isn't this implemented in clojure?
the standard way to write nested, named, procedures in clojure is to use letfn.
as an aside, your example use of nested functions is pretty suspicious. all of the functions in the example could be top-level non-local functions since they're more or less useful on their own and don't close over anything but each other.
The people critizing his placement of this in all one function, don't understand the context of why its been done in such a manner. SICP, which is where the example is from, was trying to illustrate the concept of a module, but without adding any other constructs to the base language. So "sqrt" is a module, with one function in its interface, and the rest are local or private functions within that module. This was based on R5RS scheme I believe, and later schemes have since added a standard module construct I think(?). But regardless, its more demonstrating the principle of hiding implementation.
The seasoned schemer also goes through similar examples of nested local functions, but usually to both hide implementation and to close over values as well.
But even if this wasn't a pedagogical example, you can see how this is a very lightweight module and I probably would write it this way within a larger "real" module. Reuse is ok, if its planned for. Otherwise, you are just exposing functions that probably won't be a perfect fit for what you need later on and, at the same time, burdening those functions with unexpected use cases that could break them later on.
letfn is the standard way.
But since Clojure is a Lisp, you can create (almost) any semantics you want. Here's a proof of concept that defines define in terms of letfn.
(defmacro define [& form]
(letfn [(define? [exp]
(and (list? exp) (= (first exp) 'define)))
(transform-define [[_ name args & exps]]
`(~name ~args
(letfn [~#(map transform-define (filter define? exps))]
~#(filter #(not (define? %)) exps))))]
`(defn ~#(transform-define `(define ~#form)))))
(define sqrt [x]
(define average [a b] (/ (+ a b) 2))
(define fixed-point [f first-guess]
(define close-enough? [v1 v2]
(let [tolerance 0.00001]
(< (Math/abs (- v1 v2)) tolerance)))
(define tryy [guess]
(let [next (f guess)]
(if (close-enough? guess next)
next
(tryy next))))
(tryy first-guess))
(fixed-point (fn [y] (average y (/ x y)))
1.0))
(sqrt 10) ; => 3.162277660168379
For real code, you'd want to change define to behave more like R5RS: allow non-fn values, be available in defn, defmacro, let, letfn, and fn, and verify that the inner definitions are at the beginning of the enclosing body.
Note: I had to rename try to tryy. Apparently try is a special non-function, non-macro construct for which redefinition silently fails.
everyone, I've started working yesterday on the Euler Project in Clojure and I have a problem with one of my solutions I cannot figure out.
I have this function:
(defn find-max-palindrom-in-range [beg end]
(reduce max
(loop [n beg result []]
(if (>= n end)
result
(recur (inc n)
(concat result
(filter #(is-palindrom? %)
(map #(* n %) (range beg end)))))))))
I try to run it like this:
(find-max-palindrom-in-range 100 1000)
and I get this exception:
java.lang.Integer cannot be cast to clojure.lang.IFn
[Thrown class java.lang.ClassCastException]
which I presume means that at some place I'm trying to evaluate an Integer as a function. I however cannot find this place and what puzzles me more is that everything works if I simply evaluate it like this:
(reduce max
(loop [n 100 result []]
(if (>= n 1000)
result
(recur (inc n)
(concat result
(filter #(is-palindrom? %)
(map #(* n %) (range 100 1000))))))))
(I've just stripped down the function definition and replaced the parameters with constants)
Thanks in advance for your help and sorry that I probably bother you with idiotic mistake on my part. Btw I'm using Clojure 1.1 and the newest SLIME from ELPA.
Edit: Here is the code to is-palindrom?. I've implemented it as a text property of the number, not a numeric one.
(defn is-palindrom? [n]
(loop [num (String/valueOf n)]
(cond (not (= (first num) (last num))) false
(<= (.length num) 1) true
:else (recur (.substring num 1 (dec (.length num)))))))
The code works at my REPL (1.1). I'd suggest that you paste it back at yours and try again -- perhaps you simply mistyped something?
Having said that, you could use this as an opportunity to make the code simpler and more obviously correct. Some low-hanging fruit (don't read if you think it could take away from your Project Euler fun, though with your logic already written down I think it shouldn't):
You don't need to wrap is-palindrome? in an anonymous function to pass it to filter. Just write (filter is-palindrome? ...) instead.
That loop in is-palindrome? is pretty complex. Moreover, it's not particularly efficient (first and last both make a seq out of the string first, then last needs to traverse all of it). It would be simpler and faster to (require '[clojure.contrib.str-utils2 :as str]) and use (= num (str/reverse num)).
Since I mentioned efficiency, using concat in this manner is a tad dangerous -- it creates a lazy seq, which might blow up if you pile up two many levels of laziness (this will not matter in the context of Euler 4, but it's good to keep it in mind). If you really need to extend vectors to the right, prefer into.
To further simplify things, you could consider breaking them apart into a function to filter a given sequence so that only palindromes remain and a separate function to return all products of two three-digit numbers. The latter can be accomplished with e.g.
(for [f (range 100 1000)
s (range 100 1000)
:when (<= f s)] ; avoid duplication of effort
(* f s))