Create a list from a string in Clojure - clojure

I'm looking to create a list of characters using a string as my source. I did a bit of googling and came up with nothing so then I wrote a function that did what I wanted:
(defn list-from-string [char-string]
(loop [source char-string result ()]
(def result-char (string/take 1 source))
(cond
(empty? source) result
:else (recur (string/drop 1 source) (conj result result-char)))))
But looking at this makes me feel like I must be missing a trick.
Is there a core or contrib function that does this for me? Surely I'm just being dumb right?
If not is there a way to improve this code?
Would the same thing work for numbers too?

You can just use seq function to do this:
user=> (seq "aaa")
(\a \a \a)
for numbers you can use "dumb" solution, something like:
user=> (map (fn [^Character c] (Character/digit c 10)) (str 12345))
(1 2 3 4 5)
P.S. strings in clojure are 'seq'able, so you can use them as source for any sequence processing functions - map, for, ...

if you know the input will be letters, just use
user=> (seq "abc")
(\a \b \c)
for numbers, try this
user=> (map #(Character/getNumericValue %) "123")
(1 2 3)

Edit: Oops, thought you wanted a list of different characters. For that, use the core function "frequencies".
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times they appear.
Example:
user=> (frequencies "lazybrownfox")
{\a 1, \b 1, \f 1, \l 1, \n 1, \o 2, \r 1, \w 1, \x 1, \y 1, \z 1}
Then all you have to do is get the keys and turn them into a string (or not).
user=> (apply str (keys (frequencies "lazybrownfox")))
"abflnorwxyz"

(apply str (set "lazybrownfox")) => "abflnorwxyz"

Related

How to build strings in Clojure

Say i have a list of elements like [1 2 3] and i wanted to transform it into |1|2|3|.
Or if i wanted to repeat the sequence "---" 3 times into "---------".
How should i approach it so that i can build it up into a string like that. Is there a method similar to Java's StringBuilder? Im not looking for a concrete answer to this question but just general guidance as to how to build strings in Clojure as im very new to the language.
Start with the Clojure CheatSheet. Look at the section "Strings".
Some examples:
(str/join \| [1 2 3]) => "1|2|3"
(apply str (repeat 3 "---")) => "---------"
(str
"|"
(str/join \| [1 2 3])
"|")
=> "|1|2|3|"
There are other libraries that contain many useful string functions in addition to clojure.string:
the Tupelo Clojure library. See both tupelo.string and tupelo.chars sections
the cuerdas library
Looks like there are some more listed at clojure-toolbox.com under "String Manipulation"
also, there is the cl-format function in clojure core library, which is the port of common lisp's amazing format facility.
(require '[clojure.pprint :refer [cl-format]])
user> (cl-format nil "~v#{~a~:*~}" 5 "10")
;;=> "1010101010"
user> (cl-format nil "|~{~a|~}" [1 2 3])
;;=> "|1|2|3|"
this one is really powerful, yet the format string can get quite complicated for the reader, in case of really complex string processing templates. Still, for the cases you ask about (join, repeat, iterate, or conditional output), it stays in the bounds of understandable.
there are some examples here, easily translatable from cl to clojure.
PS:
user> (cl-format nil "~r" 234598284579147)
;;=> "two hundred thirty-four trillion, five hundred ninety-eight billion, two hundred eighty-four million, five hundred seventy-nine thousand, one hundred forty-seven"
user> (cl-format nil "~#r" 1232)
;;=> "MCCXXXII"
The answer to use (apply str ...) is usually the best one. But here is an additional technique, and a "pro tip" about the three dots in (apply str ...).
If the string's content would most naturally be generated by the print functions (which is not the case with your specific examples!), then you can capture it with with-out-str:
(with-out-str
(doseq [i (range 1 4)]
(print "|")
(print i))
(println "|")) ;; => "|1|2|3|\n"
Usually, (apply str ...) is more idiomatic. You can use the whole rich tapestry of sequence functions (interleave, interpose, repeat, cycle, ...) and extract the result as a string with (apply str ...). But you face a challenge if the sequence contains nested sequences. We mention this challenge here because there are two solutions that are specific to building up strings.
To be clear, nested sequences "work fine" in every respect except that what str does to a sequence might not be what you want. For example, to build "1------2------3":
;; not quite right:
(apply str
(interpose
(repeat 2 "---")
(range 1 4))) ;; => "1(\"---\" \"---\")2(\"---\" \"---\")3"
The matter is that repeat produced a sequence, which interpose dutifully stuck between the numbers in a bigger sequence, and str when processing the bigger sequence dutifully wrote the nested sequences in Clojure syntax. To better control how nested sequences get stringified, you could replace (repeat 2 "---") with (apply str (repeat 2 "---")). But, if the pattern of apply str within apply str occurs over and over, it hurts the program's signal-to-noise ratio. An alternative that may be cleaner is the flatten function (maybe this is its only idiomatic use):
(apply str
(flatten
(interpose
(repeat 2 "---")
(range 1 4)))) ;; => "1------2------3"

Clojure pattern matching macro with variable arity that goes beyond explicit match cases

I'm in the process of translating some code from Scheme to Clojure.
The Scheme code uses a macro called pmatch (https://github.com/webyrd/quines/blob/master/pmatch.scm) to pattern match arguments to output expressions. Specifically, it allows for variable capture as follows:
(define eval-expr
(lambda (expr)
(pmatch expr
[(zero? ,e)
(zero? (eval-expr e)))
...
In this use example, some input expression to eval-expr, '(zero? 0), should match the the first case. The car of the list matches to zero? and the arity of the input matches. As a consequence, 0 is bound to ,e and passed to (zero? (eval-expr e)), and this expr is evaluated recursively.
In Haskell, which supports pattern matching natively, the code might translate to something like the following:
Prelude> let evalexpr "zero?" e = (e == 0) -- ignoring recursive application
Prelude> evalexpr "zero?" 0
True
In Clojure, I first tried to substitute pmatch with core.match (https://github.com/clojure/core.match), which was written by David Nolen and others, but, to my knowledge, this macro seems to
only support a single arity of arguments per use
only support explicit matching, rather than property based matching (available as guards)
Another option I'm trying is a lesser known macro called defun (https://github.com/killme2008/defun), which defines pattern matching functions. Here's an example:
(defun count-down
([0] (println "Reach zero!"))
([n] (println n)
(recur (dec n))))
I'm still exploring defun to see if it gives me the flexibility I need. Meanwhile, does anyone have suggestions of how to pattern match in Clojure with 1. flexible arity 2. variable capture?
Ignoring recursive application:
(ns test.test
(:require [clojure.core.match :refer [match]]))
(def v [:x 0])
(def w [:x :y 0])
(defn try-match [x]
(match x
[:x e] e
[:x expr e] [expr e]
))
(try-match v)
;; => 0
(try-match w)
;; => [:y 0]
;; Matching on lists (actually, any sequences)
(defn try-match-2 [exp]
(match exp
([op x] :seq) [op x]
([op x y] :seq) [op x y]))
(try-match-2 '(+ 3))
;; => [+ 3]
(try-match-2 '(+ 1 2))
;; => [+ 1 2]
See https://github.com/clojure/core.match/wiki/Overview for more details.
Additionally, I suggest you have a close look at Clojure destructuring. Lots of things can be done with it without resorting to core.match, actually your use case is covered.

Convert set to regex pattern in clojure

If I have this set
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
How can I turn it to this pattern string:
"foo\.clj|bar\.clj|baz\.clj"
My attempt : 
(defn set->pattern-str [coll]
(-> (clojure.string/join "|" coll)
(clojure.string/replace #"\." "\\\\.")))
(set->pattern-str my-set)
=> "foo\\.clj|baz\\.clj|bar\\.clj" ;I get the double backslash
Better ideas?
In case your set of strings might have other metacharacters than just . in them, a more general approach is to ask the underlying java.util.regex.Pattern implementation to escape everything for us:
(import 'java.util.regex.Pattern)
(defn set->pattern-str [coll]
(->> coll
(map #(Pattern/quote %))
(clojure.string/join \|)
re-pattern))
IDEone link here. Remember, IDEone is not a REPL, and you have to tell it to put values on stdout with e.g. println before you can see them.
You were close to the final solution. Double backslash is displayed because it is shown escaped. When you turn it into a seq you will see individual characters:
(seq "foo\\.clj")
;;=> (\f \o \o \\ \. \c \l \j)
And working solution:
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
(def my-set-pattern
(-> (clojure.string/join "|" my-set)
(clojure.string/replace "." "\\.")
(re-pattern)))
(re-matches my-set-pattern "foo.clj")
;;=> "foo.clj"
(re-matches my-set-pattern "bar.clj")
;;=> "bar.clj"
(re-matches my-set-pattern "baz.clj")
;;=> "baz.clj"
(re-matches my-set-pattern "foo-clj")
;;=> nil
Edit: OK, this one does in fact work. Probably want to break it apart a bit more if it's meant to be long lived code, but this is the simplest way I could find to do it with minimal string munging.
(defn is-matching-file-name [target-string]
(re-matches
(re-pattern (clojure.string/escape (String/join "|" my-set) {\. "\\."}))
target-string))
The clojure.string/escape here takes two arguments: the string to escape, and a mapping of the characters to escape to the replacement strings. The key in this map is the literal \. and the value needs two backslashes since we want to include one backslash preceding any . in the final string to be used as the argument for the re-pattern function.

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?
user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")
Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.
While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

How can I get the positions of regex matches in ClojureScript?

In Clojure I could use something like this solution: Compact Clojure code for regular expression matches and their position in string, i.e., creating a re-matcher and extracted the information from that, but re-matcher doesn't appear to be implemented in ClojureScript. What would be a good way to accomplish the same thing in ClojureScript?
Edit:
I ended up writing a supplementary function in order to preserve the modifiers of the regex as it is absorbed into re-pos:
(defn regex-modifiers
"Returns the modifiers of a regex, concatenated as a string."
[re]
(str (if (.-multiline re) "m")
(if (.-ignoreCase re) "i")))
(defn re-pos
"Returns a vector of vectors, each subvector containing in order:
the position of the match, the matched string, and any groups
extracted from the match."
[re s]
(let [re (js/RegExp. (.-source re) (str "g" (regex-modifiers re)))]
(loop [res []]
(if-let [m (.exec re s)]
(recur (conj res (vec (cons (.-index m) m))))
res))))
You can use the .exec method of JS RegExp object. The returned match object contains an index property that corresponds to the index of the match in the string.
Currently clojurescript doesn't support constructing regex literals with the g mode flag (see CLJS-150), so you need to use the RegExp constructor. Here is a clojurescript implementation of the re-pos function from the linked page:
(defn re-pos [re s]
(let [re (js/RegExp. (.-source re) "g")]
(loop [res {}]
(if-let [m (.exec re s)]
(recur (assoc res (.-index m) (first m)))
res))))
cljs.user> (re-pos "\\w+" "The quick brown fox")
{0 "The", 4 "quick", 10 "brown", 16 "fox"}
cljs.user> (re-pos "[0-9]+" "3a1b2c1d")
{0 "3", 2 "1", 4 "2", 6 "1"}