Convert set to regex pattern in clojure

Convert set to regex pattern in clojure - regex

If I have this set
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
How can I turn it to this pattern string:
"foo\.clj|bar\.clj|baz\.clj"
My attempt : 
(defn set->pattern-str [coll]
(-> (clojure.string/join "|" coll)
(clojure.string/replace #"\." "\\\\.")))
(set->pattern-str my-set)
=> "foo\\.clj|baz\\.clj|bar\\.clj" ;I get the double backslash
Better ideas?

In case your set of strings might have other metacharacters than just . in them, a more general approach is to ask the underlying java.util.regex.Pattern implementation to escape everything for us:
(import 'java.util.regex.Pattern)
(defn set->pattern-str [coll]
(->> coll
(map #(Pattern/quote %))
(clojure.string/join \|)
re-pattern))
IDEone link here. Remember, IDEone is not a REPL, and you have to tell it to put values on stdout with e.g. println before you can see them.

You were close to the final solution. Double backslash is displayed because it is shown escaped. When you turn it into a seq you will see individual characters:
(seq "foo\\.clj")
;;=> (\f \o \o \\ \. \c \l \j)
And working solution:
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
(def my-set-pattern
(-> (clojure.string/join "|" my-set)
(clojure.string/replace "." "\\.")
(re-pattern)))
(re-matches my-set-pattern "foo.clj")
;;=> "foo.clj"
(re-matches my-set-pattern "bar.clj")
;;=> "bar.clj"
(re-matches my-set-pattern "baz.clj")
;;=> "baz.clj"
(re-matches my-set-pattern "foo-clj")
;;=> nil

Edit: OK, this one does in fact work. Probably want to break it apart a bit more if it's meant to be long lived code, but this is the simplest way I could find to do it with minimal string munging.
(defn is-matching-file-name [target-string]
(re-matches
(re-pattern (clojure.string/escape (String/join "|" my-set) {\. "\\."}))
target-string))
The clojure.string/escape here takes two arguments: the string to escape, and a mapping of the characters to escape to the replacement strings. The key in this map is the literal \. and the value needs two backslashes since we want to include one backslash preceding any . in the final string to be used as the argument for the re-pattern function.

Related

replace multiple bad characters in clojure

I am trying to replace bad characters from a input string.
Characters should be valid UTF-8 characters (tabs, line breaks etc. are ok).
However I was unable to figure out how to replace all found bad characters.
My solution works for the first bad character.
Usually there are none bad characters. 1/50 cases there is one bad character. I'd just want to make my solution foolproof.
(defn filter-to-utf-8-string
"Return only good utf-8 characters from the input."
[input]
(let [bad-characters (set (re-seq #"[^\p{L}\p{N}\s\p{P}\p{Sc}\+]+" input))
filtered-string (clojure.string/replace input (apply str (first bad-characters)) "")]
filtered-string))
How can I make replace work for all values in sequence not just for the first one?
Friend of mine helped me to find workaround for this problem:
I created a filter for replace using re-pattern.
Within let code is currently
filter (if (not (empty? bad-characters))
(re-pattern (str "[" (clojure.string/join bad-characters) "]"))
#"")
filtered-string (clojure.string/replace input filter "")

Here is a simple version:
(ns xxxxx
(:require
[clojure.string :as str]
))
(def all-chars (str/join (map char (range 32 80))))
(println all-chars)
(def char-L (str/join (re-seq #"[\p{L}]" all-chars)))
(println char-L)
(def char-N (str/join (re-seq #"[\p{N}]" all-chars)))
(println char-N)
(def char-LN (str/join (re-seq #"[\p{L}\p{N}]" all-chars)))
(println char-LN)
all-chars => " !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNO"
char-L => "ABCDEFGHIJKLMNO"
char-N => "0123456789"
char-LN => "0123456789ABCDEFGHIJKLMNO"
So we start off with all ascii chars in the range of 32-80. We first print only the letter, then only the numbers, then either letters or numbers. It seems this should work for your problem, although instead of rejecting non-members of the desired set, we keep the members of the desired set.

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?

user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")

Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.

While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

Convert hyphenated string to CamelCase

I'm trying to convert a hyphenated string to CamelCase string. I followed this post: Convert hyphens to camel case (camelCase)
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case (first %1))))
(hyphenated-name-to-camel-case-name "do-get-or-post")
==> do-Get-Or-Post
Why I'm still getting the dash the output string?

You should replace first with second:
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case (second %1))))
You can check what argument clojure.string/upper-case gets by inserting println to the code:
(defn hyphenated-name-to-camel-case-name [^String method-name]
(clojure.string/replace method-name #"-(\w)"
#(clojure.string/upper-case
(do
(println %1)
(first %1)))))
When you run the above code, the result is:
[-g g]
[-o o]
[-p p]
The first element of the vector is the matched string, and the second is the captured string,
which means you should use second, not first.

In case your goal is just to to convert between cases, I really like the camel-snake-kebab library. ->CamelCase is the function-name in question.

inspired by this thread, you could also do
(use 'clojure.string)
(defn camelize [input-string]
(let [words (split input-string #"[\s_-]+")]
(join "" (cons (lower-case (first words)) (map capitalize (rest words))))))

Iterating through a map with doseq

I'm new to Clojure and I'm doing some basic stuff from labrepl, now I want to write a function that will replace certain letters with other letters, for example: elosska → elößkä.
I wrote this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize
[sentence]
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(doseq [[original-letter new-letter] german-letters]
(str/replace sentence original-letter new-letter)))
but it doesn't work as I expect. Could you help me, please?

Here is my take,
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [s]
(reduce (fn[sentence [match replacement]]
(str/replace sentence match replacement)) s german-letters))
(germanize "elosska")

There are 2 problems here:
doseq doesn't preserve head of list that created by its evaluation, so you won't get any results
str/replace works on separate copies of text, producing 4 different results - you can check this by replacing doseq with for and you'll get list with 4 entries.
You code could be rewritten following way:
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [sentence]
(loop [text sentence
letters german-letters]
(if (empty? letters)
text
(let [[original-letter new-letter] (first letters)]
(recur (str/replace text original-letter new-letter)
(rest letters))))))
In this case, intermediate results are collected, so all replacements are applied to same string, producing correct string:
user> (germanize "elosska")
"elößkä"
P.S. it's also not recommended to use def in the function - it's better to use it for top-level forms

Alex has of course already correctly answered the question with respect to the original issue using doseq... but I found the question interesting and wanted to see what a more "functional" solution would look like. And by that I mean without using a loop.
I came up with this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize [sentence]
(let [letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"}
regex (re-pattern (apply str (interpose \| (keys letters))))]
(str/replace sentence regex letters)))
Which yields the same result:
student.dialect=> (germanize "elosska")
"elößkä"
The regex (re-pattern... line simply evaluates to #"ss|a|o|u", which would have been cleaner, and simpler to read, if entered as an explicit string, but I thought it best to have only one definition of the german letters.

Create a list from a string in Clojure

I'm looking to create a list of characters using a string as my source. I did a bit of googling and came up with nothing so then I wrote a function that did what I wanted:
(defn list-from-string [char-string]
(loop [source char-string result ()]
(def result-char (string/take 1 source))
(cond
(empty? source) result
:else (recur (string/drop 1 source) (conj result result-char)))))
But looking at this makes me feel like I must be missing a trick.
Is there a core or contrib function that does this for me? Surely I'm just being dumb right?
If not is there a way to improve this code?
Would the same thing work for numbers too?

You can just use seq function to do this:
user=> (seq "aaa")
(\a \a \a)
for numbers you can use "dumb" solution, something like:
user=> (map (fn [^Character c] (Character/digit c 10)) (str 12345))
(1 2 3 4 5)
P.S. strings in clojure are 'seq'able, so you can use them as source for any sequence processing functions - map, for, ...

if you know the input will be letters, just use
user=> (seq "abc")
(\a \b \c)
for numbers, try this
user=> (map #(Character/getNumericValue %) "123")
(1 2 3)

Edit: Oops, thought you wanted a list of different characters. For that, use the core function "frequencies".
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times they appear.
Example:
user=> (frequencies "lazybrownfox")
{\a 1, \b 1, \f 1, \l 1, \n 1, \o 2, \r 1, \w 1, \x 1, \y 1, \z 1}
Then all you have to do is get the keys and turn them into a string (or not).
user=> (apply str (keys (frequencies "lazybrownfox")))
"abflnorwxyz"

(apply str (set "lazybrownfox")) => "abflnorwxyz"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert set to regex pattern in clojure - regex

Related

replace multiple bad characters in clojure

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

Convert hyphenated string to CamelCase

Iterating through a map with doseq

Create a list from a string in Clojure

Categories

Resources