Racket pattern matching of lists - list

I am trying to do pattern matching with lists, but for some reason I get an unexpected match when I do the following:
> (define code '(h1 ((id an-id-here)) Some text here))
> (define code-match-expr '(pre ([class brush: python]) ...))
> (match code
[code-match-expr #t]
[_ #f])
#t
Question: Why does code match code-match-expr?
Practical use-case
I tried this in the Racket REPL, because I actually want to solve another practical problem: using Pollen's pygments wrapping functions to highlight code, which will be output as HTML later on. For this purpose I wrote the following code, where the problem occurs:
(define (read-post-from-file path)
(Post-from-content (replace-code-xexprs (parse-markdown path))))
(define (replace-code-xexprs list-of-xexprs)
;; define known languages
(define KNOWN-LANGUAGE-SYMBOLS
(list 'python
'racket
'html
'css
'javascript
'erlang
'rust))
;; check if it matches for a single language's match expression
;; if it mathces any language, return that language's name as a symbol
(define (get-matching-language an-xexpr)
(define (matches-lang-match-expr? an-xexpr lang-symbol)
(display "XEXPR:") (displayln an-xexpr)
(match an-xexpr
[`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
[`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
[_ #f]))
(ormap (lambda (lang-symbol)
;; (display "trying to match ")
;; (display an-xexpr)
;; (display " against ")
;; (displayln lang-symbol)
(matches-lang-match-expr? an-xexpr lang-symbol))
KNOWN-LANGUAGE-SYMBOLS))
;; replace code in an xexpr with highlightable code
;; TODO: What happens if the code is in a lower level of the xexpr?
(define (replace-code-in-single-xexpr an-xexpr)
(let ([matching-language (get-matching-language an-xexpr)])
(cond [matching-language (code-highlight an-xexpr matching-language)]
[else an-xexpr])))
;; apply the check to all xexpr
(map replace-code-in-single-xexpr list-of-xexprs))
(define (code-highlight language code)
(highlight language code))
In this example I am parsing a markdown file which has the following content:
# Code Demo
```python
def hello():
print("Hello World!")
```
And I get the following xexprs:
1.
(h1 ((id code-demo)) Code Demo)
2.
(pre ((class brush: python)) (code () def hello():
print("Hello World!")))
However, none of those match for some reason.

Make sure you're clear what it is you are matching. In Racket x-expressions, attribute names are symbols but the values are strings. So the expression you're matching would be something like (pre ([class "brush: js"])) ___) -- not (pre ([class brush: js]) ___).
To match that string and extract the part after "brush: ", you could use a pregexp match pattern. Here is a snippet that Frog uses to extract the language to give to Pygments:
(for/list ([x xs])
(match x
[(or `(pre ([class ,brush]) (code () ,(? string? texts) ...))
`(pre ([class ,brush]) ,(? string? texts) ...))
(match brush
[(pregexp "\\s*brush:\\s*(.+?)\\s*$" (list _ lang))
`(div ([class ,(str "brush: " lang)])
,#(pygmentize (apply string-append texts) lang
#:python-executable python-executable
#:line-numbers? line-numbers?
#:css-class css-class))]
[_ `(pre ,#texts)])]
[x x])))
(Here pygmentize is a function defined in other Frog source code; it's a wrapper around running Pygments as a separate process and piping text between it. But you could substitute another way of using Pygments or any other syntax highlighter. That's N/A for your question about match. I mention it just so that doesn't become a distraction and another embedded question. :))

match is syntax and does not evaluate the pattern. Since code-match-expr is a symbol it will bind the whole expression (result of evaluating code) to the variable code-match-expr and evaluate the rest of the expressions as the pattern matches. The result will always be #t.
Notice that the second pattern, the symbol _, is the same pattern. It also matches the whole expression, but _ is special in the way that it does not get bound like code-match-expr does.
It's important that your defined variable code-match-expr is never used, but since the match binds a variable with the same name your original binding will be shadowed in the consequent of the match.
Code that works as you intended might look like:
(define (test code)
(match code
[`(pre ([class brush: python]) ,more ...) #t]
[_ #f]))
(test '(h1 ((id an-id-here)) Some text here))
; ==> #f
(test '(pre ((class brush: python))))
; ==> #t
(test '(pre ((class brush: python)) a b c))
; ==> #t
As you see the pattern ,more ... means zero or more and what kind of brackets is ignored since in Racket [] is the same as () and {}.
EDIT
You still got it a little backwards. In this code:
(define (matches-lang-match-expr? an-xexpr lang-symbol)
(display "XEXPR:") (displayln an-xexpr)
(match an-xexpr
[`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
[`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
[_ #f]))
When a pattern is macthed, since lang-symbol is unquoted it will match anything atomic and be bound to that as a variable in that clause. It will have nothing to do with the bound variable by the same name as a match does not use variables, it creates them. You return the variable. Thus:
(matches-lang-match-expr? '(pre ([class brush: jiffy]) bla bla bla) 'ignored-argument)
; ==> jiffy
Here is something that does what you want:
(define (get-matching-language an-xexpr)
(define (get-language an-xexpr)
(match an-xexpr
[`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
[`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
[_ #f]))
(let* ((matched-lang-symbol (get-language an-xexpr))
(in-known-languages (memq matched-lang-symbol KNOWN-LANGUAGE-SYMBOLS)))
(and in-known-languages (car in-known-languages))))
Again.. match abuses quasiquote to something completely different than creating list structure. It uses them to match literals and capture the unqoted symbols as variables.

Related

Clojure pattern matching macro with variable arity that goes beyond explicit match cases

I'm in the process of translating some code from Scheme to Clojure.
The Scheme code uses a macro called pmatch (https://github.com/webyrd/quines/blob/master/pmatch.scm) to pattern match arguments to output expressions. Specifically, it allows for variable capture as follows:
(define eval-expr
(lambda (expr)
(pmatch expr
[(zero? ,e)
(zero? (eval-expr e)))
...
In this use example, some input expression to eval-expr, '(zero? 0), should match the the first case. The car of the list matches to zero? and the arity of the input matches. As a consequence, 0 is bound to ,e and passed to (zero? (eval-expr e)), and this expr is evaluated recursively.
In Haskell, which supports pattern matching natively, the code might translate to something like the following:
Prelude> let evalexpr "zero?" e = (e == 0) -- ignoring recursive application
Prelude> evalexpr "zero?" 0
True
In Clojure, I first tried to substitute pmatch with core.match (https://github.com/clojure/core.match), which was written by David Nolen and others, but, to my knowledge, this macro seems to
only support a single arity of arguments per use
only support explicit matching, rather than property based matching (available as guards)
Another option I'm trying is a lesser known macro called defun (https://github.com/killme2008/defun), which defines pattern matching functions. Here's an example:
(defun count-down
([0] (println "Reach zero!"))
([n] (println n)
(recur (dec n))))
I'm still exploring defun to see if it gives me the flexibility I need. Meanwhile, does anyone have suggestions of how to pattern match in Clojure with 1. flexible arity 2. variable capture?
Ignoring recursive application:
(ns test.test
(:require [clojure.core.match :refer [match]]))
(def v [:x 0])
(def w [:x :y 0])
(defn try-match [x]
(match x
[:x e] e
[:x expr e] [expr e]
))
(try-match v)
;; => 0
(try-match w)
;; => [:y 0]
;; Matching on lists (actually, any sequences)
(defn try-match-2 [exp]
(match exp
([op x] :seq) [op x]
([op x y] :seq) [op x y]))
(try-match-2 '(+ 3))
;; => [+ 3]
(try-match-2 '(+ 1 2))
;; => [+ 1 2]
See https://github.com/clojure/core.match/wiki/Overview for more details.
Additionally, I suggest you have a close look at Clojure destructuring. Lots of things can be done with it without resorting to core.match, actually your use case is covered.

How can I get the positions of regex matches in ClojureScript?

In Clojure I could use something like this solution: Compact Clojure code for regular expression matches and their position in string, i.e., creating a re-matcher and extracted the information from that, but re-matcher doesn't appear to be implemented in ClojureScript. What would be a good way to accomplish the same thing in ClojureScript?
Edit:
I ended up writing a supplementary function in order to preserve the modifiers of the regex as it is absorbed into re-pos:
(defn regex-modifiers
"Returns the modifiers of a regex, concatenated as a string."
[re]
(str (if (.-multiline re) "m")
(if (.-ignoreCase re) "i")))
(defn re-pos
"Returns a vector of vectors, each subvector containing in order:
the position of the match, the matched string, and any groups
extracted from the match."
[re s]
(let [re (js/RegExp. (.-source re) (str "g" (regex-modifiers re)))]
(loop [res []]
(if-let [m (.exec re s)]
(recur (conj res (vec (cons (.-index m) m))))
res))))
You can use the .exec method of JS RegExp object. The returned match object contains an index property that corresponds to the index of the match in the string.
Currently clojurescript doesn't support constructing regex literals with the g mode flag (see CLJS-150), so you need to use the RegExp constructor. Here is a clojurescript implementation of the re-pos function from the linked page:
(defn re-pos [re s]
(let [re (js/RegExp. (.-source re) "g")]
(loop [res {}]
(if-let [m (.exec re s)]
(recur (assoc res (.-index m) (first m)))
res))))
cljs.user> (re-pos "\\w+" "The quick brown fox")
{0 "The", 4 "quick", 10 "brown", 16 "fox"}
cljs.user> (re-pos "[0-9]+" "3a1b2c1d")
{0 "3", 2 "1", 4 "2", 6 "1"}

Convert hyphenated string to CamelCase

I'm trying to convert a hyphenated string to CamelCase string. I followed this post: Convert hyphens to camel case (camelCase)
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case (first %1))))
(hyphenated-name-to-camel-case-name "do-get-or-post")
==> do-Get-Or-Post
Why I'm still getting the dash the output string?
You should replace first with second:
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case (second %1))))
You can check what argument clojure.string/upper-case gets by inserting println to the code:
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case
(do
(println %1)
(first %1)))))
When you run the above code, the result is:
[-g g]
[-o o]
[-p p]
The first element of the vector is the matched string, and the second is the captured string,
which means you should use second, not first.
In case your goal is just to to convert between cases, I really like the camel-snake-kebab library. ->CamelCase is the function-name in question.
inspired by this thread, you could also do
(use 'clojure.string)
(defn camelize [input-string]
(let [words (split input-string #"[\s_-]+")]
(join "" (cons (lower-case (first words)) (map capitalize (rest words))))))

Create a list from a string in Clojure

I'm looking to create a list of characters using a string as my source. I did a bit of googling and came up with nothing so then I wrote a function that did what I wanted:
(defn list-from-string [char-string]
(loop [source char-string result ()]
(def result-char (string/take 1 source))
(cond
(empty? source) result
:else (recur (string/drop 1 source) (conj result result-char)))))
But looking at this makes me feel like I must be missing a trick.
Is there a core or contrib function that does this for me? Surely I'm just being dumb right?
If not is there a way to improve this code?
Would the same thing work for numbers too?
You can just use seq function to do this:
user=> (seq "aaa")
(\a \a \a)
for numbers you can use "dumb" solution, something like:
user=> (map (fn [^Character c] (Character/digit c 10)) (str 12345))
(1 2 3 4 5)
P.S. strings in clojure are 'seq'able, so you can use them as source for any sequence processing functions - map, for, ...
if you know the input will be letters, just use
user=> (seq "abc")
(\a \b \c)
for numbers, try this
user=> (map #(Character/getNumericValue %) "123")
(1 2 3)
Edit: Oops, thought you wanted a list of different characters. For that, use the core function "frequencies".
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times they appear.
Example:
user=> (frequencies "lazybrownfox")
{\a 1, \b 1, \f 1, \l 1, \n 1, \o 2, \r 1, \w 1, \x 1, \y 1, \z 1}
Then all you have to do is get the keys and turn them into a string (or not).
user=> (apply str (keys (frequencies "lazybrownfox")))
"abflnorwxyz"
(apply str (set "lazybrownfox")) => "abflnorwxyz"

Defining Clojure macro syntax

I defined an unless macro as follows:
user=> (defmacro unless [expr body] (list 'if expr nil body))
#'user/unless
user=> (unless (= 1 2) (println "Yo"))
Yo
As you can see it works fine.
Now, in Clojure a list can be defined in two ways:
; create a list
(list 1 2 3)
; shorter notation
'(1 2 3)
This means that the unless macro can be written without the list keyword. However, this results in a Java exception being thrown:
user=> (unless (= 1 2) (println "Yo"))
java.lang.Exception: Unable to resolve symbol: expr in this context
Can someone explain why this fails?
'(foo bar baz) is not a shortcut for (list foo bar baz), it's a shortcut for (quote (foo bar baz)). While the list version will return a list containing the values of the variables foo, bar and baz, the version with ' will return a list containing the symbols foo, bar and baz. (In other words '(if expr nil body) is the same as (list 'if 'expr 'nil 'body).
This leads to an error because with the quoted version the macro expands to (if expr nil body) instead of (if (= 1 2) nil (println "Yo")) (because instead of substituting the macro's arguments for expr and body, it just returns the name expr and body (which are then seen as non-existent variables in the expanded code).
A shortcut that's useful in macro definitions is using `. ` works like ' (i.e. it quotes the expression following it), but it allows you to evaluate some subexpressions unquoted by using ~. For example your macro could be rewritten as (defmacro unless [expr body] `(if ~expr nil ~body)). The important thing here is that expr and body are unquoted with ~. This way the expansion will contain their values instead of literally containing the names expr and body.