Escaping brackets in Clojure - regex

If I try this
(import java.util.regex.Pattern)
(Pattern/compile ")!##$%^&*()")
or this
(def p #")!##$%^&*()")
I have Clojure complaining that there is an unmatched / unclosed ). Why are brackets evaluated within this simple string? How to escape them? Thanks
EDIT: While escaping works in the clojure-specific syntax (#""), it doesn't work with the Pattern/compile syntax that I do need because I have to compile the regex patter dynamically from a string.
I've tried with re-pattern, but I can't escape properly for some reason:
(re-pattern "\)!##$%^&*\(\)")
java.lang.Exception: Unsupported escape character: \)
java.lang.Exception: Unable to resolve symbol: ! in this context (NO_SOURCE_FILE:0)
java.lang.Exception: No dispatch macro for: $
java.lang.Exception: Unable to resolve symbol: % in this context (NO_SOURCE_FILE:0)
java.lang.IllegalArgumentException: Metadata can only be applied to IMetas
EDIT 2 This little function may help:
(defn escape-all [x]
(str "\\" (reduce #(str %1 "\\" %2) x)))

I got it working by double escaping everything. Oh the joys of double escaping.
=> (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
=> #"\)\!\#\#\$\%\^\&\*\(\)"
=> (re-find (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
")!##$%^&*()")
=> ")!##$%^&*()"
I would recommend writing a helper function str-to-pattern (or whatever you want to call it), that takes a string, double escapes everything it needs to, and then calls re-pattern on it.
Edit: making a string to pattern function
There are plenty of ways to do this, below is just one example. I start by making an smap of regex escape chars to their string replacement. An "smap" isn't an actual type, but functionally it's a map we will use to swap "old values" with "new values", where "old values" are members of the keys of the smap, and "new values" are corresponding members of the vals of smap. In our case, this smap looks like {\( "\\(", \) "\\)" ...}.
(def regex-char-esc-smap
(let [esc-chars "()*&^%$#!"]
(zipmap esc-chars
(map #(str "\\" %) esc-chars))))
Next is the actual function. I use the above smap to replace items in the string passed to it, then convert that back into a string and make a regex pattern out of it. I think the ->> macro makes the code more readable, but that's just a personal preference.
(defn str-to-pattern
[string]
(->> string
(replace regex-char-esc-smap)
(reduce str)
re-pattern))

are you sure the error is from the reader (ie from clojure itself)?
regexps use parentheses, and they have to match there too. i would guess the error is cominng from the code trying to compile the regexp.
if you want to escape a paren in a regexp, use a backquote: (def p #"\)!##$%^&*\(\)")
[update] ah, sorry, you probably need double escapes as Omri days.

All of the versions of Java that Clojure supports recognize \Q to start a quoted region and \E to end the quoted region. This allows you to do something like this:
(re-find #"\Q)!##$%^&*()\E" ")!##$%^&*()")
If you're using (re-pattern) then this will work:
(re-find (re-pattern "\\Q)!##$%^&*()\\E") ")!##$%^&*()")
If you're assembling a regular expression from a string whose content you don't know then you can use the quote method in java.util.regex.Pattern:
(re-find (re-pattern (java.util.regex.Pattern/quote some-str)) some-other-str)
Here's an example of this from my REPL:
user> (def the-string ")!##$%^&*()")
#'user/the-string
user> (re-find (re-pattern (java.util.regex.Pattern/quote the-string)) the-string)
")!##$%^&*()"

Related

Convert set to regex pattern in clojure

If I have this set
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
How can I turn it to this pattern string:
"foo\.clj|bar\.clj|baz\.clj"
My attempt : 
(defn set->pattern-str [coll]
(-> (clojure.string/join "|" coll)
(clojure.string/replace #"\." "\\\\.")))
(set->pattern-str my-set)
=> "foo\\.clj|baz\\.clj|bar\\.clj" ;I get the double backslash
Better ideas?
In case your set of strings might have other metacharacters than just . in them, a more general approach is to ask the underlying java.util.regex.Pattern implementation to escape everything for us:
(import 'java.util.regex.Pattern)
(defn set->pattern-str [coll]
(->> coll
(map #(Pattern/quote %))
(clojure.string/join \|)
re-pattern))
IDEone link here. Remember, IDEone is not a REPL, and you have to tell it to put values on stdout with e.g. println before you can see them.
You were close to the final solution. Double backslash is displayed because it is shown escaped. When you turn it into a seq you will see individual characters:
(seq "foo\\.clj")
;;=> (\f \o \o \\ \. \c \l \j)
And working solution:
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
(def my-set-pattern
(-> (clojure.string/join "|" my-set)
(clojure.string/replace "." "\\.")
(re-pattern)))
(re-matches my-set-pattern "foo.clj")
;;=> "foo.clj"
(re-matches my-set-pattern "bar.clj")
;;=> "bar.clj"
(re-matches my-set-pattern "baz.clj")
;;=> "baz.clj"
(re-matches my-set-pattern "foo-clj")
;;=> nil
Edit: OK, this one does in fact work. Probably want to break it apart a bit more if it's meant to be long lived code, but this is the simplest way I could find to do it with minimal string munging.
(defn is-matching-file-name [target-string]
(re-matches
(re-pattern (clojure.string/escape (String/join "|" my-set) {\. "\\."}))
target-string))
The clojure.string/escape here takes two arguments: the string to escape, and a mapping of the characters to escape to the replacement strings. The key in this map is the literal \. and the value needs two backslashes since we want to include one backslash preceding any . in the final string to be used as the argument for the re-pattern function.

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?
user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")
Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.
While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

dynamic regex argument in re-find function in clojure

I'm using the re-find function in clojure, and have something like
this:
(defn some-function []
(re-find #"(?i)blah" "some sentence"))
What I would like is to make the "blah" dynamic, so I substituted a var for blah like this, but
it doesn't work:
(defn some-function2 [some-string]
(re-find #(str "(?i)" some-string) "some sentence"))
I'm surprised this doesn't work since LISP is supposed to "treat code like data".
Use the function re-pattern. #"" is just a reader macro (aka. syntactic sugar for creating regex)
#(str "(?i)" some-string) is reader macro to create an anonymous functions.
To create a pattern from a string value in Clojure you can use re-pattern:
(re-pattern (str "(?i)" some-string))
One thing you don't mention is whether some-string is expected to contain a valid regex or whether it's an arbitrary string value. If some-string is an arbitrary string value (that you want to match exactly) you should quote it before using it to build your regex:
(re-pattern (str "(?i)" (java.util.regex.Pattern/quote some-string)))

Stripping Vowels in clojure

I'm trying to write a function to strip all ASCII vowels in Clojure. I am new to Clojure, and I'm having a little trouble with strings. For example the string "hello world" would return "hll wrld". I appreciate the help!
You can take advantage of the underlying functions on the string class for that.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
If that feels like cheating, you could turn the string into a seq, and then filter it with the complement of a set, and then turn that back into a string.
user=> (apply str (filter (complement #{\a \e \i \o \u}) (seq "hello world")))
"hll wrld"
Sets in clojure are also functions. complement takes a function and returns a function that returns the logical not of the original function. It's equivalent to this. apply takes a function and a bunch of arguments and calls that function with those arguments (roughly speaking).
user=> (apply str (filter #(not (#{\a \e \i \o \u} %)) (seq "hello world")))
"hll wrld"
edit
One more...
user=> (apply str (re-seq #"[^aeiou]" "hello world"))
"hll wrld"
#"[^aeiou]" is a regex, and re-seq turns the matches into a seq. It's clojure-like and seems to perform well. I might try this one before dropping down to Java. The ones that seq strings are quite a bit slower.
Important Edit
There's one more way, and that is to use clojure.string/replace. This may be the best way given that it should work in either Clojure or Clojurescript.
e.g.
dev:cljs.user=> (require '[clojure.string :as str])
nil
dev:cljs.user=> (str/replace "hello world" #"[aeiou]" "")
"hll wrld"
Bill is mostly right, but wrong enough to warrant this answer I think.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
This solution is perfectly acceptable. In fact, it's the best solution proposed. There is nothing wrong with dropping down to Java if the solution is the cleanest and fastest.
Another solution is, like he said, using sequence functions. However, his code is a little strange. Never use filter with (not ..) or complement. There is a function specifically for that, remove:
user> (apply str (remove #{\a \e \i \o \u} "hello world"))
"hll wrld"
You also don't have to call seq on the string. All of Clojure's seq functions will handle that for you.
His last solution is interesting, but I'd prefer the first one simply because it doesn't involve (apply str ..).

Getting all matches for a regexp on clojure

I'm trying to parse an HTML file and get all href's inside it.
So far, the code I'm using is:
(map
#(println (str "Match: " %))
(re-find #"(?sm)href=\"([a-zA-Z.:/]+)\"" str_response))
str_response being the string with the HTML code inside it. According to my basic understanding of Clojure, that code should print a list of matches, but so far, no luck.
It doens't crash, but it doens't match anything either.
I've tried using re-seq instead of re-find, but with no luck. Any help?
Thanks!
it is generally though that you cannot parse html with a regex (entertaining answer), though just finding all occurances of one tag should be dooable.
once you figure out the proper regex re-seq is the function you want to use:
user> (re-find #"aa" "aalkjkljaa")
"aa"
user> (re-seq #"aa" "aalkjkljaa")
("aa" "aa")
this is not crashing for you because re-find is returning nil which map is interpreting as an empty list and doing nothing
This really looks like an HTML scraping problem in which case, I would advise using enlive.
Something like this should work
(ns test.foo
(:require [net.cgrand.enlive-html :as html]))
(let [url (html/html-resource
(java.net.URL. "http://www.nytimes.com"))]
(map #(-> % :attrs :href) (html/select url [:a])))
I don't think there is anything wrong with your code. Perhapsstr_responseis the suspect. The following works with http://google.com with your regex:
(let [str_response (slurp "http://google.com")]
(map #(println (str "Match: " %))
(re-seq #"(?sm)href=\"([a-zA-Z.:/]+)\"" str_response))
Note ref-find also works though it only returns one match.