Clojure string searching and counting - regex

I was wondering what the most Clojure-esque and standardised way of searching a string and returning boolean (or something falsy/truthy). (e.g In Java I would use .indexOf() and cast it to a boolean.)
What I want to do is search all the strings in a map and return 1 or 0, depending on whether the word "clouds" is in the string, and then find out the cumulative value at the end - I understand I can do this with regex, however I was wondering if there was an alternative?

Actually in Java the most natural solution is to use the contains method. You can do the same in Clojure:
(.contains "foobar" "bar")
;= true
Mapping over a seqable:
(mapv #(.contains "foobar" ^String %) ["foo" "bar"])
;= [true true]
With a map as input, you'd have to decide whether you want the keys, the values or both; depending on the answer, you'd want to use keys, vals or just map over the entries (in this case reduce-kv would yield a more performant solution than map).
This is assuming that you're searching for a literal substring (as with indexOf). With a regex, I'd use re-find and cast to boolean (it returns nil in absence of a match).

Related

How to correctly check if a string is equal to another string in Clojure?

I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.
= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]
As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!

Replace Empty Strings In A List With A Value

If I have a list
("foo" "bar" "" "baz")
and I need to change any "" to "biz", what is a good way to go about that?
Just for completeness, here's an alternate method that uses a more specialized built-in:
(replace ; Replace all instances of...
{"" "biz"} ; "" with "biz"...
'("foo" "bar" "" "baz")) ; in the list
Which also returns a lazy sequence.
Note that the map given as the first argument can contain multiple entries. If you have multiple replacements, I'd definitely go with replace over an explicit map.
replace actually just uses map behind the scenes, but uses a map lookup instead of a equality check to do the replacements. I would expect them to preform similarly for one replacement, but for replace to be faster for more than once since it doesn't need to do a linear search over all the replacements like you would doing manual = checks.
(map #(if (empty? %) "biz" %)
'("foo" "bar" "" "baz"))

How to compare two regexps in Clojure?

I’m unit-testing a function which builds a regexp, but using = doesn’t work. How can I test that it returns the correct regexp?
Here is what I tried for an empty regexp:
(= #"" #"") ; false
(== #"" #"") ; ClassCastException java.util.regex.Pattern cannot be cast to java.lang.Number
(identical? #"" #"") ; false
(.equals #"" #"") ; false
Is there a Clojure-ish way to do that, or do I have to convert both regexps to strings then compare them?
unfortunatly there is not better way, you just have to use strings
user> (= (str #"foo") (str #"foo"))
true
user> (= (str #"foo") (str #"fooo"))
false
Even this is not perfect because it doesn't catch regular expressions that match the same strings though look different.
user> (re-seq #"[a]" "aaaa")
("a" "a" "a" "a")
user> (re-seq #"a" "aaaa")
("a" "a" "a" "a")
user> (= (str #"a") (str #"[a]"))
false
This is the same reason that you can't compare functions for equality either. I suspect that Clojure does not implament == for regexes because it would be impractical to determine if the two regexes would match the same strings (or some other idea of equality).
This is tied to the fact that pattern in clojure internally uses java.util.regex.Pattern.
If you will try to write a java program to compare two pattern objects like this, it will again return false.
The only way to do it is to do equals on regex Strings.

Extracting string from clojure collections using regex

can you suggest me the shortest and easiest way for extracting substring from string sequence? I'm getting this collection from using enlive framework, which takes content from certain web page, and here is what I am getting as result:
("background-image:url('http://s3.mangareader.net/cover/gantz/gantz-r0.jpg')"
"background-image:url('http://s3.mangareader.net/cover/deadman-wonderland/deadman-wonderland-r0.jpg')"
"background-image:url('http://s3.mangareader.net/cover/12-prince/12-prince-r1.jpg')" )
What I would like is to get some help in extracting the URL from the each string in the sequence.i tried something with partition function, but with no success. Can anyone propose a regex, or any other approach for this problem?
Thanks
re-seq to the resque!
(map #(re-seq #"http.*jpg" %) d)
(("http://s3.mangareader.net/cover/gantz/gantz-r0.jpg")
("http://s3.mangareader.net/cover/deadman-wonderland/deadman-wonderland-r0.jpg")
("http://s3.mangareader.net/cover/12-prince/12-prince-r1.jpg"))
user>
re-find is even better:
user> (map #(re-find #"http.*jpg" %) d)
("http://s3.mangareader.net/cover/gantz/gantz-r0.jpg"
"http://s3.mangareader.net/cover/deadman-wonderland/deadman-wonderland-r0.jpg"
"http://s3.mangareader.net/cover/12-prince/12-prince-r1.jpg")
because it doesn't add an extra layer of seq.
Would something simple like this work for you?
(defn extract-url [s]
(subs s (inc (.indexOf s "'")) (.lastIndexOf s "'")))
This function will return a string containing all the characters between the first and last single quotes.
Assuming your sequence of strings is named ss, then:
(map extract-url ss)
;=> ("http://s3.mangareader.net/cover/gantz/gantz-r0.jpg"
; "http://s3.mangareader.net/cover/deadman-wonderland/deadman-wonderland-r0.jpg"
; "http://s3.mangareader.net/cover/12-prince/12-prince-r1.jpg")
This is definitely not a generic solution, but it fits the input you have provided.

Reverse a string (simple question)

Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?
You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))
OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.
Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.