What is the difference between interpose and clojure.string/join? - clojure

is it fair to say that clojure.string/join is a specialized interpos function for strings? Are there any differences between these functions?

Conceptually, they're basically the same. They each take a collection* and a separator, and return a collection where the separator is between each element of the original collection. The major differences between them are:
clojure.string/join calls toString on the separator, and each element of the collection, and uses a StringBuilder to construct the String.
interpose doesn't effect the separator or collection elements, and returns a lazy list* instead of a fully realized String. It's defined in terms of interleave:
(drop 1 (interleave (repeat sep) coll))
Similar concept, but very different implementations.
*I'm ignoring the no-coll version of interpose that returns a transducer.

Interpose
Interpose takes a separator and sequence, and it returns a lazy sequence by adding separator in between the sequence elements
user=> (interpose "-" (list "a" "b" "c"))
("a" "-" "b" "-" "c")
user=> (type (interpose "-" (list "a" "b" "c")))
clojure.lang.LazySeq
Join
Join takes a separator and sequence, and it returns a String by joining elements and separator together.
user=> (clojure.string/join "-" (list "a" "b" "c"))
"a-b-c"
user=> (type (clojure.string/join "-" (list "a" "b" "c")))
java.lang.String

Your intuition is right. Note that interpose can also return a transducer and is lazy.
Just look at the docstrings or source code:
(doc interpose)
(source interpose)
etc.

Related

How to build strings in Clojure

Say i have a list of elements like [1 2 3] and i wanted to transform it into |1|2|3|.
Or if i wanted to repeat the sequence "---" 3 times into "---------".
How should i approach it so that i can build it up into a string like that. Is there a method similar to Java's StringBuilder? Im not looking for a concrete answer to this question but just general guidance as to how to build strings in Clojure as im very new to the language.
Start with the Clojure CheatSheet. Look at the section "Strings".
Some examples:
(str/join \| [1 2 3]) => "1|2|3"
(apply str (repeat 3 "---")) => "---------"
(str
"|"
(str/join \| [1 2 3])
"|")
=> "|1|2|3|"
There are other libraries that contain many useful string functions in addition to clojure.string:
the Tupelo Clojure library. See both tupelo.string and tupelo.chars sections
the cuerdas library
Looks like there are some more listed at clojure-toolbox.com under "String Manipulation"
also, there is the cl-format function in clojure core library, which is the port of common lisp's amazing format facility.
(require '[clojure.pprint :refer [cl-format]])
user> (cl-format nil "~v#{~a~:*~}" 5 "10")
;;=> "1010101010"
user> (cl-format nil "|~{~a|~}" [1 2 3])
;;=> "|1|2|3|"
this one is really powerful, yet the format string can get quite complicated for the reader, in case of really complex string processing templates. Still, for the cases you ask about (join, repeat, iterate, or conditional output), it stays in the bounds of understandable.
there are some examples here, easily translatable from cl to clojure.
PS:
user> (cl-format nil "~r" 234598284579147)
;;=> "two hundred thirty-four trillion, five hundred ninety-eight billion, two hundred eighty-four million, five hundred seventy-nine thousand, one hundred forty-seven"
user> (cl-format nil "~#r" 1232)
;;=> "MCCXXXII"
The answer to use (apply str ...) is usually the best one. But here is an additional technique, and a "pro tip" about the three dots in (apply str ...).
If the string's content would most naturally be generated by the print functions (which is not the case with your specific examples!), then you can capture it with with-out-str:
(with-out-str
(doseq [i (range 1 4)]
(print "|")
(print i))
(println "|")) ;; => "|1|2|3|\n"
Usually, (apply str ...) is more idiomatic. You can use the whole rich tapestry of sequence functions (interleave, interpose, repeat, cycle, ...) and extract the result as a string with (apply str ...). But you face a challenge if the sequence contains nested sequences. We mention this challenge here because there are two solutions that are specific to building up strings.
To be clear, nested sequences "work fine" in every respect except that what str does to a sequence might not be what you want. For example, to build "1------2------3":
;; not quite right:
(apply str
(interpose
(repeat 2 "---")
(range 1 4))) ;; => "1(\"---\" \"---\")2(\"---\" \"---\")3"
The matter is that repeat produced a sequence, which interpose dutifully stuck between the numbers in a bigger sequence, and str when processing the bigger sequence dutifully wrote the nested sequences in Clojure syntax. To better control how nested sequences get stringified, you could replace (repeat 2 "---") with (apply str (repeat 2 "---")). But, if the pattern of apply str within apply str occurs over and over, it hurts the program's signal-to-noise ratio. An alternative that may be cleaner is the flatten function (maybe this is its only idiomatic use):
(apply str
(flatten
(interpose
(repeat 2 "---")
(range 1 4)))) ;; => "1------2------3"

How to correctly check if a string is equal to another string in Clojure?

I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.
= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]
As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!

Convert set to regex pattern in clojure

If I have this set
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
How can I turn it to this pattern string:
"foo\.clj|bar\.clj|baz\.clj"
My attempt : 
(defn set->pattern-str [coll]
(-> (clojure.string/join "|" coll)
(clojure.string/replace #"\." "\\\\.")))
(set->pattern-str my-set)
=> "foo\\.clj|baz\\.clj|bar\\.clj" ;I get the double backslash
Better ideas?
In case your set of strings might have other metacharacters than just . in them, a more general approach is to ask the underlying java.util.regex.Pattern implementation to escape everything for us:
(import 'java.util.regex.Pattern)
(defn set->pattern-str [coll]
(->> coll
(map #(Pattern/quote %))
(clojure.string/join \|)
re-pattern))
IDEone link here. Remember, IDEone is not a REPL, and you have to tell it to put values on stdout with e.g. println before you can see them.
You were close to the final solution. Double backslash is displayed because it is shown escaped. When you turn it into a seq you will see individual characters:
(seq "foo\\.clj")
;;=> (\f \o \o \\ \. \c \l \j)
And working solution:
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
(def my-set-pattern
(-> (clojure.string/join "|" my-set)
(clojure.string/replace "." "\\.")
(re-pattern)))
(re-matches my-set-pattern "foo.clj")
;;=> "foo.clj"
(re-matches my-set-pattern "bar.clj")
;;=> "bar.clj"
(re-matches my-set-pattern "baz.clj")
;;=> "baz.clj"
(re-matches my-set-pattern "foo-clj")
;;=> nil
Edit: OK, this one does in fact work. Probably want to break it apart a bit more if it's meant to be long lived code, but this is the simplest way I could find to do it with minimal string munging.
(defn is-matching-file-name [target-string]
(re-matches
(re-pattern (clojure.string/escape (String/join "|" my-set) {\. "\\."}))
target-string))
The clojure.string/escape here takes two arguments: the string to escape, and a mapping of the characters to escape to the replacement strings. The key in this map is the literal \. and the value needs two backslashes since we want to include one backslash preceding any . in the final string to be used as the argument for the re-pattern function.

How to compare two regexps in Clojure?

I’m unit-testing a function which builds a regexp, but using = doesn’t work. How can I test that it returns the correct regexp?
Here is what I tried for an empty regexp:
(= #"" #"") ; false
(== #"" #"") ; ClassCastException java.util.regex.Pattern cannot be cast to java.lang.Number
(identical? #"" #"") ; false
(.equals #"" #"") ; false
Is there a Clojure-ish way to do that, or do I have to convert both regexps to strings then compare them?
unfortunatly there is not better way, you just have to use strings
user> (= (str #"foo") (str #"foo"))
true
user> (= (str #"foo") (str #"fooo"))
false
Even this is not perfect because it doesn't catch regular expressions that match the same strings though look different.
user> (re-seq #"[a]" "aaaa")
("a" "a" "a" "a")
user> (re-seq #"a" "aaaa")
("a" "a" "a" "a")
user> (= (str #"a") (str #"[a]"))
false
This is the same reason that you can't compare functions for equality either. I suspect that Clojure does not implament == for regexes because it would be impractical to determine if the two regexes would match the same strings (or some other idea of equality).
This is tied to the fact that pattern in clojure internally uses java.util.regex.Pattern.
If you will try to write a java program to compare two pattern objects like this, it will again return false.
The only way to do it is to do equals on regex Strings.

Reverse a string (simple question)

Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?
You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))
OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.
Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.