clojure equivalent for ruby's gsub - regex

How do i do this in clojure
"text".gsub(/(\d)([ap]m|oclock)\b/, '\1 \2')

To add to Isaac's answer, this is how you would use clojure.string/replace in this particular occasion:
user> (str/replace "9oclock"
#"(\d)([ap]m|oclock)\b"
(fn [[_ a b]] (str a " " b)))
; ^- note the destructuring of the match result
;^- using an fn to produce the replacement
"9 oclock"
To add to sepp2k's answer, this is how you can take advantage of Clojure's regex literals while using the "$1 $2" gimmick (arguably simpler than a separate fn in this case):
user> (.replaceAll (re-matcher #"(\d)([ap]m|oclock)\b" "9oclock")
; ^- note the regex literal
"$1 $2")
"9 oclock"

You can use Java's replaceAll method. The call would look like:
(.replaceAll "text" "(\\d)([ap]m|oclock)\\b" "$1 $2")
Note that this will return a new string (like gsub (without the bang) would in ruby). There is no equivalent for gsub! in Clojure as Java/Clojure string are immutable.

That would be replace in the clojure.string namespace. You can find it here.
Use it like so:
(ns rep
(:use [clojure.string :only (replace)]))
(replace "this is a testing string testing testing one two three" ;; string
"testing" ;; match
"Mort") ;; replacement
replace is awesome because the match and replacement can also be string/string or char/char, or you could even do regex pattern/function of the match or string.

Clojure contrib now has re-gsub as a part of str-utils:
user=> (def examplestr (str "jack and jill" \newline "went up the hill"))
#'user/examplestr
user=> (println examplestr)
jack and jill
went up the hill
nil
user=> (println (re-gsub #"\n" " " examplestr))
jack and jill went up the hill
nil

Related

clojure.string/replace does not match pattern matched by re-seq

Why does clojure.string/replace not match the \"[^\"]+\" pattern while re-seq does?
(re-seq #"\"[^\"]+\"" "ab,\"helo,bro\",yo")
=> ("\"helo,bro\"")
(clojure.string/replace "ab,\"helo,bro\",yo" #"\"[^\"]+\”" "")
=> "ab,\"helo,bro\",yo"
I would expect replace to delete the matched pattern. What am I missing here?
Thanks for insight.
Your regex are (probably unintentionally) different: in the replace option you used \” instead of \".
If you use the same exact regex it will work as expected:
(def r #"\"[^\"]+\"")
(re-seq r "ab,\"helo,bro\",yo")
=> ("\"helo,bro\"")
(clojure.string/replace "ab,\"helo,bro\",yo" r "")
=> "ab,,yo"

Realizing a Clojure lazy sequence (string) in the REPL

I'm trying to realize a lazy sequence (which should generate a single string) in the REPL, with no luck. The original code works fine:
(def word_list ["alpha" "beta" "gamma" "beta" "alpha" "alpha" "beta" "beta" "beta"])
(def word_string (reduce str (interpose " " word_list)));
word_string ; "alpha beta gamma beta alpha alpha beta beta beta"
But not wanting to leave well enough alone, I wondered what else would work, and tried removing the reduce, thinking that str might have the same effect. It did not...
(def word_string (str (interpose " " word_list)))
word_string ; "clojure.lang.LazySeq#304a9790"
I tried the obvious, using reduce again, but that didn't work either. There's another question about realizing lazy sequences that seemed promising, but nothing I tried worked:
(reduce str word_string) ; "clojure.lang.LazySeq#304a9790"
(apply str word_string) ; "clojure.lang.LazySeq#304a9790"
(println word_string) ; "clojure.lang.LazySeq#304a9790"
(apply list word_string) ; [\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...]
(vec word_string) ; [\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...]
(apply list word_string) ; (\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...)
(take 100 word_string) ; (\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...)
The fact that some of the variations, gave me the characters in "clojure.lang.LazySeq" also worries me - did I somehow lose the actual string value, and my reference just has the value "clojure.lang.LazySeq"? If not, how do I actually realize the value?
To clarify: given that word_string is assigned to a lazy sequence, how would I realize it? Something like (realize word_string), say, if that existed.
Update: based on the accepted Answer and how str works, it turns out that I can get the actual sequence value, not just its name:
(reduce str "" word_string) ; "alpha beta gamma beta alpha alpha beta beta beta"
Yes, this is terrible code. :) I was just trying to understand what was going on, why it was breaking, and whether the actual value was still there or not.
What you want is:
(def word_string (apply str (interpose " " word_list)))
Look at the documentation of str:
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So you're calling .toString on the sequence, which generates that representation instead of applying str to the elements of the sequence as arguments.
BTW, the more idiomatic way of doing what you want is:
(clojure.string/join " " word_list)
Also, a string is not a lazy sequence. interpose returns a lazy sequence, and you're calling .toString on that.
You don't need to do anything special to realize a lazy seq in clojure, you just use it in any place that a seq is expected.
user=> (def word-list ["alpha" "beta" "gamma" "beta" "alpha" "alpha" "beta" "beta" "beta"])
#'user/word-list
user=> (def should-be-a-seq (interpose " " word_list))
#'user/should-be-a-seq
user=> (class should-be-a-seq)
clojure.lang.LazySeq
So we have a seq, if I use it in any case that would go through all of the values in the seq, it will end up fully realized. e.g.
user=> (seq should-be-a-seq)
("alpha" " " "beta" " " "gamma" " " "beta" " " "alpha" " " "alpha" " " "beta" " " "beta" " " "beta")
It's still a seq though, in fact it's still the same object that it was before.
user=> (str should-be-a-seq)
"clojure.lang.LazySeq#304a9790"
As Diego Basch mentioned, calling str on something is just like calling .toString, which for the LazySeq method is apparently just the default .toString that is inherited from Object.
Since it's a seq, you can use it like any seq, whether it's been fully realized previously or not. e.g.
user=> (apply str should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (reduce str should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (str/join should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (str/join (map #(.toUpperCase %) should-be-a-seq))
"ALPHA BETA GAMMA BETA ALPHA ALPHA BETA BETA BETA"

Splitting a string on backtick in clojure

I'm trying to split a string on backtick in the clojure repl, like this:
user> (require '[clojure.string :as str])
user> (str/split "1=/1`2=/2'" #"`'")
Result is:
["1=/1`2=/2'"]
In short, I'm unable to split on the backtick character. And I don't know why. How can I make this work?
p.s:Notice the apostrophe at the end of the string and in the split argument. These are auto inserted in the repl.
You have an extra ' in your regex.
This works fine:
(str/split "1=/1`2=/2'" #"`")

Clojure - how to count specific words in a string

(def string "this is an example string. forever and always and and")
can somebody help me? I coding in Clojure, and I have been trying to count how many times the word 'and' appears in the string.
any help is much appreciated
One way to do it is to use regular expressions and re-seq function. Here is a "naive" example:
(count (re-seq #"and" string))
And here is the same code, written with treading macro ->>:
(->> string
(re-seq #"and")
count)
It will count all appearances of sub-string "and" in your string. It means that words like panda will be counted too. But we could count only for and words by adding some restrictions to the regular expression (using a "word boundary" metacharacter \b):
(->> string
(re-seq #"\band\b")
count)
This version will ensure that "and" sub-string is surrounded by non-letter characters.
And if you want case-insensitive search (to include "And"):
(->> string
(re-seq #"(?i)\band\b")
count)
Alternative solution is to use split function from clojure.string namespace:
(require '[clojure.string :as s])
(->> (s/split string #"\W+") ; split string on non-letter characters
(map s/lower-case) ; for case-insensitive search
(filter (partial = "and"))
count)

how to split a string in clojure not in regular expression mode

The split in both clojure and java takes regular expression as parameter to split. But I just want to use normal char to split. The char passed in could be "|", ","," " etc. how to split a line by that char?
I need some function like (split string a-char). And this function will be called at very high frequency, so need good performance. Any good solution.
There are a few features in java.util.regex.Pattern class that support treating strings as literal regular expressions. This is useful for cases such as these. #cgrand already alluded to (Pattern/quote s) in a comment to another answer. One more such feature is the LITERAL flag (documented here). It can be used when compiling literal regular expression patterns. Remember that #"foo" in Clojure is essentially syntax sugar for (Pattern/compile "foo"). Putting it all together we have:
(import 'java.util.regex.Pattern)
(clojure.string/split "foo[]bar" (Pattern/compile "[]" Pattern/LITERAL))
;; ["foo" "bar"]
Just make your character a regex by properly escaping special characters and use the default regex split (which is fastest by far).
This version will make a regexp that automatically escapes every character or string within it
(defn char-to-regex
[c]
(re-pattern (java.util.regex.Pattern/quote (str c))))
This version will make a regexp that escapes a single character if it's within the special character range of regexps
(defn char-to-regex
[c]
(if ((set "<([{\\^-=$!|]})?*+.>") c)
(re-pattern (str "\\" c))
(re-pattern c)))
Make sure to bind the regex, so you don't call char-to-regex over and over again if you need to do multiple splits
(let [break (char-to-regex \|)]
(clojure.string/split "This is | the string | to | split" break))
=> ["This is " " the string " " to " " split"]