Pattern matching functions in Clojure? - clojure

I have used erlang in the past and it has some really useful things like pattern matching functions or "function guards". Example from erlang docs is:
fact(N) when N>0 ->
N * fact(N-1);
fact(0) ->
1.
But this could be expanded to a much more complex example where the form of parameter and values inside it are matched.
Is there anything similar in clojure?

There is ongoing work towards doing this with unification in the core.match ( https://github.com/clojure/core.match ) library.
Depending on exactly what you want to do, another common way is to use defmulti/defmethod to dispatch on arbitrary functions. See http://clojuredocs.org/clojure_core/clojure.core/defmulti (at the bottom of that page is the factorial example)

I want to introduce defun, it's a macro to define functions with pattern matching just like erlang,it's based on core.match. The above fact function can be wrote into:
(use 'defun)
(defun fact
([0] 1)
([(n :guard #(> % 0))]
(* n (fact (dec n)))))
Another example, an accumulator from zero to positive number n:
(defun accum
([0 ret] ret)
([n ret] (recur (dec n) (+ n ret)))
([n] (recur n 0)))
More information please see https://github.com/killme2008/defun

core.match is a full-featured and extensible pattern matching library for Clojure. With a little macro magic and you can probably get a pretty close approximation to what you're looking for.

Also, if you want to take apart only simple structures like vectors and maps (any thing that is sequence or map, e.g. record, in fact), you could also use destructuring bind. This is the weaker form of pattern matching, but still is very useful. Despite it is described in let section there, it can be used in many contexts, including function definitions.

Related

Subset / Subsequence Recursive Procedure in Simply Scheme Lisp

I am working my way through Simply Scheme in combination with the Summer 2011 CS3 Course from Berkley. I am struggling with my understanding of the subset / subsequence procedures. I understand the basic mechanics once I'm presented with the solution code, but am struggling to grasp the concepts enough to come up with the solution on my own.
Could anyone point me in the direction of something that might help me understand it a little bit better? Or maybe explain it differently themselves?
This is the basis of what I understand so far:
So, in the following procedure, the subsequences recursive call that is an argument to prepend, is breaking down the word to its basest element, and prepend is adding the first of the word to each of those elements.
; using words and sentences
(define (subsequences wd)
(if (empty? wd)
(se "")
(se (subsequences (bf wd))
(prepend (first wd)
(subsequences (bf wd))))))
(define (prepend l wd)
(every (lambda (w) (word l w))
wd))
; using lists
(define (subsequences ls)
(if (null? ls)
(list '())
(let ((next (subsequences (cdr ls))))
(append (map (lambda (x) (cons (car ls) x))
next)
next))))
So the first one, when (subsequences 'word) is entered, would return:
("" d r rd o od or ord w wd wr wrd wo wod wor word)
The second one, when (subsequences '(1 2 3)) is entered, would return:
((1 2 3) (1 2) (1 3) (1) (2 3) (2) (3) ())
So, as I said, this code works. I understand each of the parts of the code individually and, for the most part, how they work with each other. The nested recursive call is what is giving me the trouble. I just don't completely understand it well enough to write such code on my own. Anything that might be able to help me understand it would be greatly appreciated. I think I just need a new perspective to wrap my head around it.
Thanks in advance for anyone willing to point me in the right direction.
EDIT:
So the first comment asked me to try and explain a little more about what I understand so far. Here it goes:
For the words / sentence procedure, I think that it's breaking the variable down to it's "basest" case (so to speak) via the recursive call that appears second.
Then it's essentially building on the basest case, by prepending.
I don't really understand why the recursive call that appears first needs to be there then.
In the lists one, when I was writing it on my own I got this:
(define (subseq lst)
(if (null? lst)
'()
(append (subseq (cdr lst))
(prepend (car lst)
(subseq (cdr lst))))))
(define (prepend i lst)
(map (lambda (itm) (cons i itm))
lst))
With the correct solution it looks to me like the car of the list would just drop off and not be accounted for, but obviously that's not the case. I'm not grasping how the two recursive calls are working together.
Your alternate solution is mostly good, but you've made the same mistake many people make when implementing this (power-set of a list) function for the first time: your base case is wrong.
How many ways are there to choose a subset of 0 or more items from a 0-element list? "0" may feel obvious, but in fact there is one way: choose none of the items. So instead of returning the empty list (meaning "there are no ways it can be done"), you should return (list '()) (meaning, "a list of one way to do it, which is to choose no elements"). Equivalently you could return '(()), which is the same as (list '()) - I don't know good Scheme style, so I'll leave that to you.
Once you've made that change, your solution works, demonstrating that you do in fact understand the recursion after all!
As to explaining the solution that was provided to you, I don't quite see what you think would happen to the car of the list. It's actually very nearly the exact same algorithm as the one you wrote yourself: to see how close it is, inline your definition of prepend (that is, substitute its body into your subsequences function). Then expand the let binding from the provided solution, substituting its body in the two places it appears. Finally, if you want, you can swap the order of the arguments to append - or not; it doesn't matter much. At this point, it's the same function you wrote.
Recursion is a tool which is there to help us, to make programming easier.
A recursive approach doesn't try to solve the whole problem at once. It says, what if we already had the solution code? Then we could apply it to any similar smaller part of the original problem, and get the solution for it back. Then all we'd have to do is re-combine the leftover "shell" which contained that smaller self-similar part, with the result for that smaller part; and that way we'd get our full solution for the full problem!
So if we can identify that recursive structure in our data; if we can take it apart like a Russian "matryoshka" doll which contains its own smaller copy inside its shell (which too contains the smaller copies of self inside itself, all the way down) and can put it back; then all we need to do to transform the whole "doll" is to transform the nested "matryoshka" doll contained in it (with all the nested dolls inside -- we don't care how many levels deep!) by applying to it that recursive procedure which we are seeking to create, and simply put back the result:
solution( shell <+> core ) == shell {+} solution( core )
;; -------------- ----
The two +s on the two sides of the equation are different, because the transformed doll might not be a doll at all! (also, the <+> on the left is deconstructing a given datum, while {+} on the right constructs the overall result.)
This is the recursion scheme used in your functions.
Some problems are better fit for other recursion schemes, e.g. various sorts, Voronoi diagrams, etc. are better done with divide-and-conquer:
solution( part1 <+> part2 ) == solution( part1 ) {+} solution( part2 )
;; --------------- ----- -----
As for the two -- or one -- recursive calls, since this is mathematically a function, the result of calling it with the same argument is always the same. There's no semantic difference, only an operational one.
Sometimes we prefer to calculate a result and keep it in memory for further reuse; sometimes we prefer to recalculate it every time it's needed. It does not matter as far as the end result is concerned -- the same result will be calculated, the only difference being the consumed memory and / or the time it will take to produce that result.

How to make clojure program constructs easier to identify?

Clojure, being a Lisp dialect, inherited Lisp's homoiconicity. Homoiconicity makes metaprogramming easier, since code can be treated as data: reflection in the language (examining the program's entities at runtime) depends on a single, homogeneous structure, and it does not have to handle several different structures that would appear in a complex syntax [1].
The downside of a more homogeneous language structure is that language constructs, such as loops, nested ifs, function calls or switches, etc, are more similar to each other.
In clojure:
;; if:
(if (chunked-seq? s)
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
(cons (first s) (concat (rest s) y)))
;; function call:
(repaint (chunked-seq? s)
(chunk-cons (chunk-first s) (concat (chunk-rest s) y))
(cons (first s) (concat (rest s) y)))
The difference between the two constructs is just a word. In a non homoiconic language:
// if:
if (chunked-seq?(s))
chunk-cons(chunk-first(s), concat(chunk-rest(s), y));
else
cons(first(s), concat(rest(s), y));
// function call:
repaint(chunked-seq?(s),
chunk-cons(chunk-first(s), concat(chunk-rest(s), y)),
cons(first(s), concat(rest(s), y));
Is there a way to make these program constructs easier to identify (more conspicuous) in Clojure? Maybe some recommended code format or best practice?
Besides using an IDE that supports syntax highlighting for the different cases, no, there isn't really a way to differentiate between them in the code itself.
You could try and use formatting to differentiate between function calls and macros:
(for [a b]
[a a])
(some-func [a b] [a a])
But then that prevents you from using a one-line list comprehension using for; sometimes they can neatly fit on one line. This also prevents you from breaking up large function calls into several lines. Unless the reducing function is pre-defined, most of my calls to reduce take the form:
(reduce (fn [a b] ...)
starting-acc
coll)
There are just too many scenarios to try to limit how calls are formatted. What about more complicated macros like cond?
I think a key thing to understand is that the operation of a form depends entirely on the first symbol in the form. Instead of relying on special syntaxes to differentiate between them, train your eyes to snap to the first symbol in the form, and do a quick "lookup" in your head.
And really, there are only a few cases that need to be considered:
Special forms like if and let (actually let*). These are fundamental constructs of the language, so you will be exposed to them constantly.
I don't think these should pose a problem. Your brain should immediately know what's going on when you see if. There are so few special forms that plain memorization is the best route.
Macros with "unusual" behavior like threading macros and cond. There are still some instances where I'll be looking over someone's code, and because they're using a macro that I don't have a lot of familiarity with, it'll take me a second to figure out the flow of the code.
This is remedied though by just getting some practice with the macro. Learning a new macro extends your capabilities when writing Clojure anyway, so this should always be considered. As with special forms, there really aren't that many mind-bending macros, so memorizing the main ones (the basic threading macros, and conditional macros) is simple.
Functions. If it's not either of the above, it must be a function and follow typical function calling syntax.

Combing multiple functions into a single function

I have written several functions that input strings and use varying regular expressions to search for patterns within the strings. All of the functions work on the same input [string]. What is the optimal way to combine all such functions into a single function?
I had tried combining the all of the regular expressions into a single regex, but ran into issues of degeneracy. Whereby the pattern fit multiple regular expressions and was outputting incorrect results. Next, I tried using the threading arrows -> and ->> but was unable to get those to work. I believe this might be the right option to use, but could not get the functions to run properly. So I am unable to test my hypothesis.
As an example of two functions to combine consider the following:
(defn fooip [string]
(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" string))
and
(defn foophone [string]
(re-seq #"[0-9]{3}-?[0-9]{3}-?[0-9]{4}" s))
If you have multiple functions that you want to combine into a function that will return the result of applying each function to the same input, that is exactly the purpose of juxt.
(def foo (juxt foophone fooip))
(foo "555-555-5555 222.222.222.222 888-888-8888")
;=> [("555-555-5555" "888-888-8888") ("222.222.222.222")]
Your question is a little vague, but the threading arrows' purpose is to apply multiple functions sequentially to the output of each other: (-> 1 inc inc inc), for example, is equivalent to (inc (inc (inc 1))).
From your code samples, it looks like you have multiple regexes you want to match against a single input string. The simple way to do that is to use for:
(for [r [#"foo" #"bar" #"baz"]] (re-seq r s))
To check for both patterns you can use or:
(defn phone-or-ip [s]
(or (matchphone s) (matchip s)))
There isn't one proper way to combine functions. It depends what you want to do.
P.S. There are ways to combine the regexps themselves. The naïve way is to just use | and parentheses to combine the two. I think there are optimizers, which can improve such patterns.

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.
When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))
The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.

Managing number of brackets in clojure

I am new to clojure and the main thing I am struggling with is writing readable code. I often end up with functions like the one below.
(fn rep
([lst n]
(rep (rest lst)
n
(take n
(repeat (first lst)))))
([lst n out]
(if
(empty? lst)
out
(rep
(rest lst) n
(concat out (take n
(repeat
(first lst))))))))
with lots of build ups of end brackets. What are the best ways of reducing this or formatting it in a way that makes it easier to spot missing brackets?
Using Emacs's paredit mode (emulated in a few other editors too) means you're generally - unless you're copy/pasting with mouse/forced-unstructured selections - dealing with matched brackets/braces/parentheses and related indenting with no counting needed.
Emacs with https://github.com/technomancy/emacs-starter-kit (highly recommended!) has paredit enabled for clojure by default. Otherwise, see http://emacswiki.org/emacs/ParEdit
In addition to having an editor that supports brace matching, you can also try to make your code less nested. I believe that your function could be rewritten as:
(defn rep [coll n] (mapcat (partial repeat n) coll))
Of course this is more of an art (craft) than science, but some pointers (in random order):
Problems on 4clojure and their solutions by top users (visible after solving particular problems) - I believe that Chris Houser is there under the handle chouser
Speaking of CH - "The Joy of Clojure" is a very useful read
Browsing docs on clojure.core - there are a lot of useful functions there
-> and ->> threading macros are very useful for flattening nested code
stackoverflow - some of the brightest and most helpful people in the world answer questions there ;-)
An editor that colors the parenthesis is extremely helpful in this case. For example, here's what your code looks in my vim editor (using vimclojure):
Since you didn't say which editor you use, you'll have to find the rainbow-coloring feature for your editor appropriately.
I cannot echo strongly enough how valuable it is to use paredit, or some similar feature in another editor. It frees you from caring at all about parens - they always match themselves up perfectly, and tedious, error-prone editing tasks like "change (foo (bar x) y) into (foo (bar x y))" become a single keystroke. For a week or so paredit will frustrate you beyond belief as it prevents you from doing things manually, but once you learn the automatic ways of handling parens, you will never be able to go back.
I recently heard someone say, and I think it's roughly accurate, that writing lisp without paredit is like writing java without auto-complete (possible, but not very pleasant).
(fn rep
([lst n]
(rep lst n nil))
([lst n acc]
(if-let [s (seq lst)]
(recur (rest s) n (concat acc (repeat n (first s))))
acc)))
that's more readable, i think. note that:
you should use recur when tail recursing
you should test with seq - see http://clojure.org/lazy
repeat can take a count
concat will drop nil, which saves repeating yourself
you don't need to start a new line for every open paren
as for the parens - your editor/ide should take care of that. i am typing blind here, so forgive me if it's wrong...
[Rafał Dowgird's code is shorter; i am learning too...]
[updated:] after re-reading the "lazy" link, i think i have been handling lazy sequences incorrectly,
I'm not sure you can avoid all the brackets. However, what I've seen Lispers do is use an editor with paren matching/highlight and maybe even rainbow brackets: http://emacs-fu.blogspot.com/2011/05/toward-balanced-and-colorful-delimiters.html
Frankly, these are the kind of features that would be useful for non-Lisp editors too :)
Always use 100% recycled closing parentheses made from at least 75% post-consumer materials; then you don't have to feel so bad about using so many.
Format it however you like. It is the editor's job to display code in whatever style the reader prefers. I like the C-style hierarchical tree-shaped format with single brackets on their own lines (all the LISPers boil with rage at that :-)))))))))))))
But, I sometimes use this style:
(fn rep
([lst n]
(rep (rest lst)
n
(take n
(repeat (first lst)) ) ) ) )
which is an update on the traditional style in which brackets are spaced (log2 branch-level)
The reason I like space is that my eyesight is poor and I simply cannot read dense text. So to the angry LISPers who are about to tell me to do things the traditional way I say, well, everyone has their own way, relax, it's ok.
Can't wait for someone to write a decent editor in Clojure though, which is not a text editor but an expression editor**, then the issue of formatting goes away. I'm writing one myself but it takes time. The idea is to edit expressions by applying functions to them, and I navigate the code with a zipper, expression-by-expression, not by words or characters or lines. The code is represented by whatever display function you want.
** yes, I know there's emacs/paredit, but I tried emacs and didn't like it sorry.