Clojure - how to count specific words in a string

Clojure - how to count specific words in a string - clojure

(def string "this is an example string. forever and always and and")
can somebody help me? I coding in Clojure, and I have been trying to count how many times the word 'and' appears in the string.
any help is much appreciated

One way to do it is to use regular expressions and re-seq function. Here is a "naive" example:
(count (re-seq #"and" string))
And here is the same code, written with treading macro ->>:
(->> string
(re-seq #"and")
count)
It will count all appearances of sub-string "and" in your string. It means that words like panda will be counted too. But we could count only for and words by adding some restrictions to the regular expression (using a "word boundary" metacharacter \b):
(->> string
(re-seq #"\band\b")
count)
This version will ensure that "and" sub-string is surrounded by non-letter characters.
And if you want case-insensitive search (to include "And"):
(->> string
(re-seq #"(?i)\band\b")
count)
Alternative solution is to use split function from clojure.string namespace:
(require '[clojure.string :as s])
(->> (s/split string #"\W+") ; split string on non-letter characters
(map s/lower-case) ; for case-insensitive search
(filter (partial = "and"))
count)

Related

Replace end of string with String/replace and re-pattern - Clojure

I want to remove a substring at the end of a string containing some code.
I have a vector a containing the expression "c=c+1"
My goal is to remove the expression "c=c+1;" at the end of my expression.
I have used the $ symbol indicating that the substring to replace must be at the end of my code.
Here is the code and the output :
project.core=> (def a [:LangFOR [:before_for "a=0; b="] [:init_var "c=a+1;"] [:loop_condition_expression "c-10;"] [:loop_var_step "c=c+1"] [:statements_OK "a=2*c;"] [:after_for " b+c;"]])
#'project.core/a
project.core=> (prn (str "REGEX debug : " (clojure.string/replace "b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;" (re-pattern (str "# "(get-in a [4 1]) ";$")) "")))
"REGEX debug : b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;"
nil
The expected output is :
"REGEX debug : b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;"
How can I correct my (re-pattern) function?
Thanks.

The string you're using to build the regex pattern has some characters in it that have special meaning in a regular expression. The + in c+1 is interpreted as one or more occurrences of c followed by 1. Java's Pattern class provides a function to escape/quote strings so they can be used literally in regex patterns. You could use it directly, or define a wrapper function:
(defn re-quote [s]
(java.util.regex.Pattern/quote s))
(re-quote "c=c+1")
=> "\\Qc=c+1\\E"
This function simply wraps the input string in some special control characters \Q and \E to have the interpreter start and stop the quoting of the contents.
Now you can use that literal string to build a regex pattern:
(clojure.string/replace
"b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;c=c+1;"
(re-pattern (str (re-quote "c=c+1;") "$"))
"")
=> "b=0;c=a+1;a=2*c;c=c+1;c=c+1;a=2*c;"
I removed the leading "# " from the pattern in your example to make this work, because that doesn't appear in the input.

Splitting a string on backtick in clojure

I'm trying to split a string on backtick in the clojure repl, like this:
user> (require '[clojure.string :as str])
user> (str/split "1=/1`2=/2'" #"`'")
Result is:
["1=/1`2=/2'"]
In short, I'm unable to split on the backtick character. And I don't know why. How can I make this work?
p.s:Notice the apostrophe at the end of the string and in the split argument. These are auto inserted in the repl.

You have an extra ' in your regex.
This works fine:
(str/split "1=/1`2=/2'" #"`")

how to split a string in clojure not in regular expression mode

The split in both clojure and java takes regular expression as parameter to split. But I just want to use normal char to split. The char passed in could be "|", ","," " etc. how to split a line by that char?
I need some function like (split string a-char). And this function will be called at very high frequency, so need good performance. Any good solution.

There are a few features in java.util.regex.Pattern class that support treating strings as literal regular expressions. This is useful for cases such as these. #cgrand already alluded to (Pattern/quote s) in a comment to another answer. One more such feature is the LITERAL flag (documented here). It can be used when compiling literal regular expression patterns. Remember that #"foo" in Clojure is essentially syntax sugar for (Pattern/compile "foo"). Putting it all together we have:
(import 'java.util.regex.Pattern)
(clojure.string/split "foo[]bar" (Pattern/compile "[]" Pattern/LITERAL))
;; ["foo" "bar"]

Just make your character a regex by properly escaping special characters and use the default regex split (which is fastest by far).
This version will make a regexp that automatically escapes every character or string within it
(defn char-to-regex
[c]
(re-pattern (java.util.regex.Pattern/quote (str c))))
This version will make a regexp that escapes a single character if it's within the special character range of regexps
(defn char-to-regex
[c]
(if ((set "<([{\\^-=$!|]})?*+.>") c)
(re-pattern (str "\\" c))
(re-pattern c)))
Make sure to bind the regex, so you don't call char-to-regex over and over again if you need to do multiple splits
(let [break (char-to-regex \|)]
(clojure.string/split "This is | the string | to | split" break))
=> ["This is " " the string " " to " " split"]

Rotate a list-of-list matrix in Clojure

I'm new to Clojure and functional programming in general. I'm at a loss in how to handle this in a functional way.
I have the following matrix:
(def matrix [[\a \b \c]
[\d \e \f]
[\g \h \i]])
I want to transform it into something like this (rotate counterclockwise):
((\a \d \g)
(\b \e \h)
(\c \f \i ))
I've hacked up this bit that gives me the elements in the correct order. If I could collect the data in a string this way I could then split it up with partition. However I'm pretty sure doseq is the wrong path:
(doseq [i [0 1 2]]
(doseq [row matrix]
(println (get (vec row) i))))
I've dabbled with nested map calls but keep getting stuck with that. What's the correct way to build up a string in Clojure or handle this in an even better way?

What you're trying to achieve sounds like transpose. I'd suggest
(apply map list matrix)
; => ((\a \d \g) (\b \e \h) (\c \f \i))
What does it do?
(apply map list '((\a \b \c) (\d \e \f) (\g \h \i)))
is equivalent to
(map list '(\a \b \c) '(\d \e \f) '(\g \h \i))
which takes first elements of each of the three lists, calls list on them, then takes second elements, calls list on them... An returns a sequence of all lists which were generated this way.
A couple more examples of both apply and map can be found on ClojureDocs.

Taking the matrix transposition solution directly from rosettacode:
(vec (apply map vector matrix))
To see what is going on consider:
(map vector [\a \b \c] [\d \e \f] [\g \h \i])
This will work nicely with arbitrary matrix dimensions although it is not good for significant number crunching, for that you would want to consider using a java based matrix manipulation library from Clojure.

You can use core.matrix to do these kind of matrix manipulations very easily. In particular, there is already a transpose function that does exactly what you want:
Example:
(use 'clojure.core.matrix)
(def matrix [[\a \b \c]
[\d \e \f]
[\g \h \i]])
(transpose matrix)
=> [[\a \d \g]
[\b \e \h]
[\c \f \i]]

Here's one way:
(def transposed-matrix (apply map list matrix))
;=> ((\a \d \g) (\b \e \h) (\c \f \i))
(doseq [row transposed-matrix]
(doall (map println row)))
That produces the same output as your original (printing the columns of matrix).

clojure equivalent for ruby's gsub

How do i do this in clojure
"text".gsub(/(\d)([ap]m|oclock)\b/, '\1 \2')

To add to Isaac's answer, this is how you would use clojure.string/replace in this particular occasion:
user> (str/replace "9oclock"
#"(\d)([ap]m|oclock)\b"
(fn [[_ a b]] (str a " " b)))
; ^- note the destructuring of the match result
;^- using an fn to produce the replacement
"9 oclock"
To add to sepp2k's answer, this is how you can take advantage of Clojure's regex literals while using the "$1 $2" gimmick (arguably simpler than a separate fn in this case):
user> (.replaceAll (re-matcher #"(\d)([ap]m|oclock)\b" "9oclock")
; ^- note the regex literal
"$1 $2")
"9 oclock"

You can use Java's replaceAll method. The call would look like:
(.replaceAll "text" "(\\d)([ap]m|oclock)\\b" "$1 $2")
Note that this will return a new string (like gsub (without the bang) would in ruby). There is no equivalent for gsub! in Clojure as Java/Clojure string are immutable.

That would be replace in the clojure.string namespace. You can find it here.
Use it like so:
(ns rep
(:use [clojure.string :only (replace)]))
(replace "this is a testing string testing testing one two three" ;; string
"testing" ;; match
"Mort") ;; replacement
replace is awesome because the match and replacement can also be string/string or char/char, or you could even do regex pattern/function of the match or string.

Clojure contrib now has re-gsub as a part of str-utils:
user=> (def examplestr (str "jack and jill" \newline "went up the hill"))
#'user/examplestr
user=> (println examplestr)
jack and jill
went up the hill
nil
user=> (println (re-gsub #"\n" " " examplestr))
jack and jill went up the hill
nil

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure - how to count specific words in a string - clojure

(def string "this is an example string. forever and always and and") can somebody help me? I coding in Clojure, and I have been trying to count how many times the word 'and' appears in the string. any help is much appreciated

Related

Replace end of string with String/replace and re-pattern - Clojure

Splitting a string on backtick in clojure

how to split a string in clojure not in regular expression mode

Rotate a list-of-list matrix in Clojure

clojure equivalent for ruby's gsub

Categories

Resources