Splitting a string on backtick in clojure - regex

I'm trying to split a string on backtick in the clojure repl, like this:
user> (require '[clojure.string :as str])
user> (str/split "1=/1`2=/2'" #"`'")
Result is:
["1=/1`2=/2'"]
In short, I'm unable to split on the backtick character. And I don't know why. How can I make this work?
p.s:Notice the apostrophe at the end of the string and in the split argument. These are auto inserted in the repl.

You have an extra ' in your regex.
This works fine:
(str/split "1=/1`2=/2'" #"`")

Related

In ClojureScript, how to split string around a regex and keep the matches in the result, without using lookaround?

I want to split a string on an arbitrary regular expression (similar to clojure.string/split) but keep the matches in the result. One way to do this is with lookaround in the regex but this doesn't work well in ClojureScript because it's not supported by all browsers.
In my case, the regex is #"\{\{\s*[A-Za-z0-9_\.]+?\s*\}\}")
So for example, foo {{bar}} baz should be split into ("foo " "{{bar}}" " baz").
Thanks!
One possible solution is to choose some special character as a delimiter, insert it into the string during replace and then split on that. Here I used exclamation mark:
Require: [clojure.string :as s]
(-> "foo {{bar}} baz"
(s/replace #"\{\{\s*[A-Za-z0-9_\.]+?\s*\}\}" "!$0!")
(s/split #"!"))
=> ["foo " "{{bar}}" " baz"]

Replace end of string with String/replace and re-pattern - Clojure

I want to remove a substring at the end of a string containing some code.
I have a vector a containing the expression "c=c+1"
My goal is to remove the expression "c=c+1;" at the end of my expression.
I have used the $ symbol indicating that the substring to replace must be at the end of my code.
Here is the code and the output :
project.core=> (def a [:LangFOR [:before_for "a=0; b="] [:init_var "c=a+1;"] [:loop_condition_expression "c-10;"] [:loop_var_step "c=c+1"] [:statements_OK "a=2*c;"] [:after_for " b+c;"]])
#'project.core/a
project.core=> (prn (str "REGEX debug : " (clojure.string/replace "b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;" (re-pattern (str "# "(get-in a [4 1]) ";$")) "")))
"REGEX debug : b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;"
nil
The expected output is :
"REGEX debug : b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;"
How can I correct my (re-pattern) function?
Thanks.
The string you're using to build the regex pattern has some characters in it that have special meaning in a regular expression. The + in c+1 is interpreted as one or more occurrences of c followed by 1. Java's Pattern class provides a function to escape/quote strings so they can be used literally in regex patterns. You could use it directly, or define a wrapper function:
(defn re-quote [s]
(java.util.regex.Pattern/quote s))
(re-quote "c=c+1")
=> "\\Qc=c+1\\E"
This function simply wraps the input string in some special control characters \Q and \E to have the interpreter start and stop the quoting of the contents.
Now you can use that literal string to build a regex pattern:
(clojure.string/replace
"b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;"
(re-pattern (str (re-quote "c=c+1;") "$"))
"")
=> "b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;"
I removed the leading "# " from the pattern in your example to make this work, because that doesn't appear in the input.

Clojure - how to count specific words in a string

(def string "this is an example string. forever and always and and")
can somebody help me? I coding in Clojure, and I have been trying to count how many times the word 'and' appears in the string.
any help is much appreciated
One way to do it is to use regular expressions and re-seq function. Here is a "naive" example:
(count (re-seq #"and" string))
And here is the same code, written with treading macro ->>:
(->> string
(re-seq #"and")
count)
It will count all appearances of sub-string "and" in your string. It means that words like panda will be counted too. But we could count only for and words by adding some restrictions to the regular expression (using a "word boundary" metacharacter \b):
(->> string
(re-seq #"\band\b")
count)
This version will ensure that "and" sub-string is surrounded by non-letter characters.
And if you want case-insensitive search (to include "And"):
(->> string
(re-seq #"(?i)\band\b")
count)
Alternative solution is to use split function from clojure.string namespace:
(require '[clojure.string :as s])
(->> (s/split string #"\W+") ; split string on non-letter characters
(map s/lower-case) ; for case-insensitive search
(filter (partial = "and"))
count)

how to split a string in clojure not in regular expression mode

The split in both clojure and java takes regular expression as parameter to split. But I just want to use normal char to split. The char passed in could be "|", ","," " etc. how to split a line by that char?
I need some function like (split string a-char). And this function will be called at very high frequency, so need good performance. Any good solution.
There are a few features in java.util.regex.Pattern class that support treating strings as literal regular expressions. This is useful for cases such as these. #cgrand already alluded to (Pattern/quote s) in a comment to another answer. One more such feature is the LITERAL flag (documented here). It can be used when compiling literal regular expression patterns. Remember that #"foo" in Clojure is essentially syntax sugar for (Pattern/compile "foo"). Putting it all together we have:
(import 'java.util.regex.Pattern)
(clojure.string/split "foo[]bar" (Pattern/compile "[]" Pattern/LITERAL))
;; ["foo" "bar"]
Just make your character a regex by properly escaping special characters and use the default regex split (which is fastest by far).
This version will make a regexp that automatically escapes every character or string within it
(defn char-to-regex
[c]
(re-pattern (java.util.regex.Pattern/quote (str c))))
This version will make a regexp that escapes a single character if it's within the special character range of regexps
(defn char-to-regex
[c]
(if ((set "<([{\\^-=$!|]})?*+.>") c)
(re-pattern (str "\\" c))
(re-pattern c)))
Make sure to bind the regex, so you don't call char-to-regex over and over again if you need to do multiple splits
(let [break (char-to-regex \|)]
(clojure.string/split "This is | the string | to | split" break))
=> ["This is " " the string " " to " " split"]

clojure equivalent for ruby's gsub

How do i do this in clojure
"text".gsub(/(\d)([ap]m|oclock)\b/, '\1 \2')
To add to Isaac's answer, this is how you would use clojure.string/replace in this particular occasion:
user> (str/replace "9oclock"
#"(\d)([ap]m|oclock)\b"
(fn [[_ a b]] (str a " " b)))
; ^- note the destructuring of the match result
;^- using an fn to produce the replacement
"9 oclock"
To add to sepp2k's answer, this is how you can take advantage of Clojure's regex literals while using the "$1 $2" gimmick (arguably simpler than a separate fn in this case):
user> (.replaceAll (re-matcher #"(\d)([ap]m|oclock)\b" "9oclock")
; ^- note the regex literal
"$1 $2")
"9 oclock"
You can use Java's replaceAll method. The call would look like:
(.replaceAll "text" "(\\d)([ap]m|oclock)\\b" "$1 $2")
Note that this will return a new string (like gsub (without the bang) would in ruby). There is no equivalent for gsub! in Clojure as Java/Clojure string are immutable.
That would be replace in the clojure.string namespace. You can find it here.
Use it like so:
(ns rep
(:use [clojure.string :only (replace)]))
(replace "this is a testing string testing testing one two three" ;; string
"testing" ;; match
"Mort") ;; replacement
replace is awesome because the match and replacement can also be string/string or char/char, or you could even do regex pattern/function of the match or string.
Clojure contrib now has re-gsub as a part of str-utils:
user=> (def examplestr (str "jack and jill" \newline "went up the hill"))
#'user/examplestr
user=> (println examplestr)
jack and jill
went up the hill
nil
user=> (println (re-gsub #"\n" " " examplestr))
jack and jill went up the hill
nil