dynamic regex argument in re-find function in clojure

dynamic regex argument in re-find function in clojure - regex

I'm using the re-find function in clojure, and have something like
this:
(defn some-function []
(re-find #"(?i)blah" "some sentence"))
What I would like is to make the "blah" dynamic, so I substituted a var for blah like this, but
it doesn't work:
(defn some-function2 [some-string]
(re-find #(str "(?i)" some-string) "some sentence"))
I'm surprised this doesn't work since LISP is supposed to "treat code like data".

Use the function re-pattern. #"" is just a reader macro (aka. syntactic sugar for creating regex)
#(str "(?i)" some-string) is reader macro to create an anonymous functions.

To create a pattern from a string value in Clojure you can use re-pattern:
(re-pattern (str "(?i)" some-string))
One thing you don't mention is whether some-string is expected to contain a valid regex or whether it's an arbitrary string value. If some-string is an arbitrary string value (that you want to match exactly) you should quote it before using it to build your regex:
(re-pattern (str "(?i)" (java.util.regex.Pattern/quote some-string)))

Related

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?

user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")

Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.

While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

Stripping Vowels in clojure

I'm trying to write a function to strip all ASCII vowels in Clojure. I am new to Clojure, and I'm having a little trouble with strings. For example the string "hello world" would return "hll wrld". I appreciate the help!

You can take advantage of the underlying functions on the string class for that.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
If that feels like cheating, you could turn the string into a seq, and then filter it with the complement of a set, and then turn that back into a string.
user=> (apply str (filter (complement #{\a \e \i \o \u}) (seq "hello world")))
"hll wrld"
Sets in clojure are also functions. complement takes a function and returns a function that returns the logical not of the original function. It's equivalent to this. apply takes a function and a bunch of arguments and calls that function with those arguments (roughly speaking).
user=> (apply str (filter #(not (#{\a \e \i \o \u} %)) (seq "hello world")))
"hll wrld"
edit
One more...
user=> (apply str (re-seq #"[^aeiou]" "hello world"))
"hll wrld"
#"[^aeiou]" is a regex, and re-seq turns the matches into a seq. It's clojure-like and seems to perform well. I might try this one before dropping down to Java. The ones that seq strings are quite a bit slower.
Important Edit
There's one more way, and that is to use clojure.string/replace. This may be the best way given that it should work in either Clojure or Clojurescript.
e.g.
dev:cljs.user=> (require '[clojure.string :as str])
nil
dev:cljs.user=> (str/replace "hello world" #"[aeiou]" "")
"hll wrld"

Bill is mostly right, but wrong enough to warrant this answer I think.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
This solution is perfectly acceptable. In fact, it's the best solution proposed. There is nothing wrong with dropping down to Java if the solution is the cleanest and fastest.
Another solution is, like he said, using sequence functions. However, his code is a little strange. Never use filter with (not ..) or complement. There is a function specifically for that, remove:
user> (apply str (remove #{\a \e \i \o \u} "hello world"))
"hll wrld"
You also don't have to call seq on the string. All of Clojure's seq functions will handle that for you.
His last solution is interesting, but I'd prefer the first one simply because it doesn't involve (apply str ..).

Escaping brackets in Clojure

If I try this
(import java.util.regex.Pattern)
(Pattern/compile ")!##$%^&*()")
or this
(def p #")!##$%^&*()")
I have Clojure complaining that there is an unmatched / unclosed ). Why are brackets evaluated within this simple string? How to escape them? Thanks
EDIT: While escaping works in the clojure-specific syntax (#""), it doesn't work with the Pattern/compile syntax that I do need because I have to compile the regex patter dynamically from a string.
I've tried with re-pattern, but I can't escape properly for some reason:
(re-pattern "\)!##$%^&*\(\)")
java.lang.Exception: Unsupported escape character: \)
java.lang.Exception: Unable to resolve symbol: ! in this context (NO_SOURCE_FILE:0)
java.lang.Exception: No dispatch macro for: $
java.lang.Exception: Unable to resolve symbol: % in this context (NO_SOURCE_FILE:0)
java.lang.IllegalArgumentException: Metadata can only be applied to IMetas
EDIT 2 This little function may help:
(defn escape-all [x]
(str "\\" (reduce #(str %1 "\\" %2) x)))

I got it working by double escaping everything. Oh the joys of double escaping.
=> (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
=> #"\)\!\#\#\$\%\^\&\*\(\)"
=> (re-find (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
")!##$%^&*()")
=> ")!##$%^&*()"
I would recommend writing a helper function str-to-pattern (or whatever you want to call it), that takes a string, double escapes everything it needs to, and then calls re-pattern on it.
Edit: making a string to pattern function
There are plenty of ways to do this, below is just one example. I start by making an smap of regex escape chars to their string replacement. An "smap" isn't an actual type, but functionally it's a map we will use to swap "old values" with "new values", where "old values" are members of the keys of the smap, and "new values" are corresponding members of the vals of smap. In our case, this smap looks like {\( "\\(", \) "\\)" ...}.
(def regex-char-esc-smap
(let [esc-chars "()*&^%$#!"]
(zipmap esc-chars
(map #(str "\\" %) esc-chars))))
Next is the actual function. I use the above smap to replace items in the string passed to it, then convert that back into a string and make a regex pattern out of it. I think the ->> macro makes the code more readable, but that's just a personal preference.
(defn str-to-pattern
[string]
(->> string
(replace regex-char-esc-smap)
(reduce str)
re-pattern))

are you sure the error is from the reader (ie from clojure itself)?
regexps use parentheses, and they have to match there too. i would guess the error is cominng from the code trying to compile the regexp.
if you want to escape a paren in a regexp, use a backquote: (def p #"\)!##$%^&*\(\)")
[update] ah, sorry, you probably need double escapes as Omri days.

All of the versions of Java that Clojure supports recognize \Q to start a quoted region and \E to end the quoted region. This allows you to do something like this:
(re-find #"\Q)!##$%^&*()\E" ")!##$%^&*()")
If you're using (re-pattern) then this will work:
(re-find (re-pattern "\\Q)!##$%^&*()\\E") ")!##$%^&*()")
If you're assembling a regular expression from a string whose content you don't know then you can use the quote method in java.util.regex.Pattern:
(re-find (re-pattern (java.util.regex.Pattern/quote some-str)) some-other-str)
Here's an example of this from my REPL:
user> (def the-string ")!##$%^&*()")
#'user/the-string
user> (re-find (re-pattern (java.util.regex.Pattern/quote the-string)) the-string)
")!##$%^&*()"

Getting all matches for a regexp on clojure

I'm trying to parse an HTML file and get all href's inside it.
So far, the code I'm using is:
(map
#(println (str "Match: " %))
(re-find #"(?sm)href=\"([a-zA-Z.:/]+)\"" str_response))
str_response being the string with the HTML code inside it. According to my basic understanding of Clojure, that code should print a list of matches, but so far, no luck.
It doens't crash, but it doens't match anything either.
I've tried using re-seq instead of re-find, but with no luck. Any help?
Thanks!

it is generally though that you cannot parse html with a regex (entertaining answer), though just finding all occurances of one tag should be dooable.
once you figure out the proper regex re-seq is the function you want to use:
user> (re-find #"aa" "aalkjkljaa")
"aa"
user> (re-seq #"aa" "aalkjkljaa")
("aa" "aa")
this is not crashing for you because re-find is returning nil which map is interpreting as an empty list and doing nothing

This really looks like an HTML scraping problem in which case, I would advise using enlive.
Something like this should work
(ns test.foo
(:require [net.cgrand.enlive-html :as html]))
(let [url (html/html-resource
(java.net.URL. "http://www.nytimes.com"))]
(map #(-> % :attrs :href) (html/select url [:a])))

I don't think there is anything wrong with your code. Perhapsstr_responseis the suspect. The following works with http://google.com with your regex:
(let [str_response (slurp "http://google.com")]
(map #(println (str "Match: " %))
(re-seq #"(?sm)href=\"([a-zA-Z.:/]+)\"" str_response))
Note ref-find also works though it only returns one match.

Reverse a string (simple question)

Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?

You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))

OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.

Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

dynamic regex argument in re-find function in clojure - regex

Use the function re-pattern. #"" is just a reader macro (aka. syntactic sugar for creating regex) #(str "(?i)" some-string) is reader macro to create an anonymous functions.

Related

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

Stripping Vowels in clojure

Escaping brackets in Clojure

Getting all matches for a regexp on clojure

Reverse a string (simple question)

Categories

Resources