I’m unit-testing a function which builds a regexp, but using = doesn’t work. How can I test that it returns the correct regexp?
Here is what I tried for an empty regexp:
(= #"" #"") ; false
(== #"" #"") ; ClassCastException java.util.regex.Pattern cannot be cast to java.lang.Number
(identical? #"" #"") ; false
(.equals #"" #"") ; false
Is there a Clojure-ish way to do that, or do I have to convert both regexps to strings then compare them?
unfortunatly there is not better way, you just have to use strings
user> (= (str #"foo") (str #"foo"))
true
user> (= (str #"foo") (str #"fooo"))
false
Even this is not perfect because it doesn't catch regular expressions that match the same strings though look different.
user> (re-seq #"[a]" "aaaa")
("a" "a" "a" "a")
user> (re-seq #"a" "aaaa")
("a" "a" "a" "a")
user> (= (str #"a") (str #"[a]"))
false
This is the same reason that you can't compare functions for equality either. I suspect that Clojure does not implament == for regexes because it would be impractical to determine if the two regexes would match the same strings (or some other idea of equality).
This is tied to the fact that pattern in clojure internally uses java.util.regex.Pattern.
If you will try to write a java program to compare two pattern objects like this, it will again return false.
The only way to do it is to do equals on regex Strings.
Related
I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.
= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]
As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!
is it fair to say that clojure.string/join is a specialized interpos function for strings? Are there any differences between these functions?
Conceptually, they're basically the same. They each take a collection* and a separator, and return a collection where the separator is between each element of the original collection. The major differences between them are:
clojure.string/join calls toString on the separator, and each element of the collection, and uses a StringBuilder to construct the String.
interpose doesn't effect the separator or collection elements, and returns a lazy list* instead of a fully realized String. It's defined in terms of interleave:
(drop 1 (interleave (repeat sep) coll))
Similar concept, but very different implementations.
*I'm ignoring the no-coll version of interpose that returns a transducer.
Interpose
Interpose takes a separator and sequence, and it returns a lazy sequence by adding separator in between the sequence elements
user=> (interpose "-" (list "a" "b" "c"))
("a" "-" "b" "-" "c")
user=> (type (interpose "-" (list "a" "b" "c")))
clojure.lang.LazySeq
Join
Join takes a separator and sequence, and it returns a String by joining elements and separator together.
user=> (clojure.string/join "-" (list "a" "b" "c"))
"a-b-c"
user=> (type (clojure.string/join "-" (list "a" "b" "c")))
java.lang.String
Your intuition is right. Note that interpose can also return a transducer and is lazy.
Just look at the docstrings or source code:
(doc interpose)
(source interpose)
etc.
I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?
user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")
Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.
While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))
Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?
You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))
OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.
Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.
The following two commands prints out the same thing in repl:
user=> (println "(foo bar)")
(foo bar)
nil
user=> (println (quote (foo bar))
(foo bar)
nil
So in this case, what's the difference between a quote and a string?
Edit:
(+ 3 2) and (+ (quote 3) 2) are the same. The docs say quote yields the unevaluated form (so maybe I'm answering my own question here but please verify) that a quote is an optimization with lazy evaluation?
They're indeed different things:
user=> (class '(foo bar))
clojure.lang.PersistentList
user=> (class "foo bar")
java.lang.String
Even if they might have an identical println result, they're not the same.
For the rest, #bmillare is right: you don't quote for laziness, you quote to express literals.
The reason they look the same is because println is specified to print the content of strings and quoted forms, including the name of the symbols, to stdout. If you want to print the forms as how they would look like when inputted to the reader, use prn (pr if you don't want the newline)
user=> (prn "(foo bar)")
"(foo bar)"
nil
user=> (prn (quote (foo bar)))
(foo bar)
nil
For the other question,
Quote is not an optimization with lazy evaluation. The reason you get (+ 3 2) and (+ (quote 3) 2) is that you are quoting a literal e.g. a number, a keyword, or a string. (http://clojure.org/reader) Literals are evaluated at read time, before the form is passed to the upper form. Another way to think of it is quoting literals simply is defined to be an identity.