How to get all trigrams of a string in clojure

How to get all trigrams of a string in clojure - clojure

Suppose I have a string "This is a string". The tri-grams would be "Thi", "his", "is ", "s i" etc. I want to return a vector of all the trim-grams. How can I do that?

You can use partition or partition-all depending on whether you are
interested also in the last "non-tri-grams":
user=> (doc partition)
-------------------------
clojure.core/partition
([n coll] [n step coll] [n step pad coll])
Returns a lazy sequence of lists of n items each, at offsets step
apart. If step is not supplied, defaults to n, i.e. the partitions
do not overlap. If a pad collection is supplied, use its elements as
necessary to complete last partition upto n items. In case there are
not enough padding elements, return a partition with less than n items.
(user=> (doc partition-all)
-------------------------
clojure.core/partition-all
([n] [n coll] [n step coll])
Returns a lazy sequence of lists like partition, but may include
partitions with fewer than n items at the end. Returns a stateful
transducer when no collection is provided.
E.g.
user=> (partition 3 1 "This is a string")
((\T \h \i)
(\h \i \s)
(\i \s \space)
(\s \space \i)
(\space \i \s)
(\i \s \space)
(\s \space \a)
(\space \a \space)
(\a \space \s)
(\space \s \t)
(\s \t \r)
(\t \r \i)
(\r \i \n)
(\i \n \g))
To get the strings back, join the chars:
user=> (map clojure.string/join (partition 3 1 "This is a string"))
("Thi"
"his"
"is "
"s i"
" is"
"is "
"s a"
" a "
"a s"
" st"
"str"
"tri"
"rin"
"ing")
Or replace with partition-all accordingly:
user=> (map clojure.string/join (partition-all 3 1 "This is a string"))
("Thi"
; ...
"rin"
"ing"
"ng" ; XXX
"g") ; XXX

Related

Comparing two strings and returning the number of matched words

I'm fairly new to Clojure, and in programming, in general.
Is there a way I can compare two strings word by word and then return the number of matched words in both strings? Also how can I count the numbers in a string?
Ex:
comparing string1 "Hello Alan and Max" and string2 "Hello Alan and Bob" will return "3" (such as Hello Alan and are the words matched in both strings)
and finding the number of words in string1 will result in the number 4.
Thank you

Let's break it down into some smaller problems:
compare two strings word by word
First we'll need a way to take a string and return its words. One way to do this is to assume any whitespace is separating words, so we can use a regular expression with clojure.string/split:
(defn string->words [s]
(clojure.string/split s #"\s+"))
(string->words "Hello world, it's me, Clojure.")
=> ["Hello" "world," "it's" "me," "Clojure."]
return the number of matched words in both strings
The easiest way I can imagine doing this is to build two sets, one to represent the set of words in both sentences, and finding the intersection of the two sets:
(set (string->words "a b c a b c d e f"))
=> #{"d" "f" "e" "a" "b" "c"} ;; #{} represents a set
And we can use the clojure.set/intersection function to find the intersection of two sets:
(defn common-words [a b]
(let [a (set (string->words a))
b (set (string->words b))]
(clojure.set/intersection a b)))
(common-words "say you" "say me")
=> #{"say"}
To get the count of (matching) words, we can use the count function with the output of the above functions:
(count (common-words "say you" "say me")) ;; => 1

what you need to do, is to compare word sequences' items pairwise, and count the number of items until the first mismatch. Here is an almost word for word translation of this:
(defn mismatch-idx [s1 s2]
(let [w #"\S+"]
(->> (map = (re-seq w s1) (re-seq w s2))
(take-while true?)
count)))
user> (mismatch-idx "a b c" "qq b c")
;;=> 0
user> (mismatch-idx "a b c" "a x c")
;;=> 1
user> (mismatch-idx "a b c" "a b x")
;;=> 2

take-while on clojure.string does not work

I'm wondering why clojure does not treat string as an array like in scala or haskell.
I want take-while function on string as in scala below
scala> "chich and chong".takeWhile(_ != ' ')
res1: String = chich
But take-while in clojure does not seem to work with string.
user=> (take-while #(not= % " ") "chich and chong")
(\c \h \i \c \h \space \a \n \d \space \c \h \o \n \g)
Just to make sure char/string equality works in clojure,
user=> (= " " " ")
true
user=> (not= 'A " ")
true
take-while does work with vector only.
user=> (take-while #(< % 0) [-3 -2 -1 0 1 2 3])
(-3 -2 -1)
Tried converting string to vector as well, but returns the same as input.
user=> (vec "apple")
[\a \p \p \l \e]
user=> (take-while #(not= % "p") (vec "apple"))
(\a \p \p \l \e)
how can I use take-while with clojure.string?

You should write character literal instead of string with space:
user=> (take-while #(not= % \space) "chich and chong")
=> (\c \h \i \c \h)
That is because:
" " - is java.lang.String
\space - is java.lang.Character
more info \ - Character literal

Just to point out that your code would work in ClojuseScript, because the host platform (JavaScript) has no character type, so characters are represented as one-character strings. On the JVM though characters are there own type.

Write a function to print (non-negative) integer numbers in full words in Clojure

(defn num-as-words [n]
(let [words '("zero" "one" "two" "three" "four"
"five" "six" "seven" "eight" "nine")]
(clojure.string/join "-"
(map (fn [x] (nth words (Integer. (re-find #"\d" (str x)) ))) (str n)))))
I've written this function called as num-as-words which takes an integer and displays it as full words, for example if you were to input (123) it would return (one-two-three).
I've done it using a map but I was wondering if there was another way of doing it? I was also wondering if there was another way to connect the words rather than clojure.string/join, I was initially using interpose but didn't like the way it was outputting, as it looked like ("one" "-" "two" "-" "three").
Any help would be greatly appreciated, thank you.

user=> (clojure.pprint/cl-format ; formatted printing
nil ; ... to a string
"~{~R~^-~}" ; format (see below)
(map ; map over characters
(fn [x] (Integer. (str x))) ; convert char to integer
(str 123))) ; convert number to string
"one-two-three"
First, we take the input number, here hard-coded as "123" in the example, coerce it as a string and iterate over the resulting string's characters thanks to map. For each character, we build a string containing that character and parse it as an Integer. Thus, we obtain a list of digits.
More precisely, (fn [x] ...) is a function taking one argument. You should probably name it char instead (sorry), because we iterate over characters. When we evaluate (str x), we obtain a string containing one char, namely x. For example, if the character is \2, the resulting string is "2". The (Integer. string) form (notice the dot!) calls the constructor for the Integer class, which parse a string as an integer. To continue with our example, (Integer. "2") would yield the integer 2.
We use cl-format to print the list of digits into a fresh string (as requested by the false argument). In order to do that, we specify the format as follows:
~{...~} iterates over a list and executes the format inside the braces for each element.
~R prints a number as an english word (1 => one, etc.)
~^ escapes the iteration made by ~{...~} when there is no remaining arguments. So when we print the last digit, the part that follows ~^ is not printed.
What follows ~^ is simply the character -. This is used to separate strings but we had to take care to not print a dash for all iterations of the loop, otherwise the resulting string would have ended with a dash.
If any character cannot be parsed as an Integer then the function will report an error. You might want to check first that the input really is a positive integer before converting it to a string.

I'd implement it like this:
(defn num-as-words [n]
(let [words ["zero" "one" "two" "three" "four" "five" "six" "seven" "eight" "nine"]]
(->> (str n)
(map #(Character/getNumericValue %))
(map words)
(clojure.string/join "-"))))
Using vector will simplify the implementation.
Instead of splitting number string with regular expression, you can treat it as sequence. In this case, you should use Charactor/getNumericValue to convert char to integer.
You can use ->> macro.
Using clojure.string/join looks fine.
interpose returns lazy sequence. That's why it returns like ("one" "-" "two"...). You should apply str to the result, (apply str (interpose ...)) to convert it to string.
If you want to handle negative numbers, you can modify the code like this:
(defn num-as-words [n]
(if (< n 0)
(str "-" (num-as-words (- n)))
(let [words ["zero" "one" "two" "three" "four" "five" "six" "seven" "eight" "nine"]]
(->> (str n)
(map #(Character/getNumericValue %))
(map words)
(clojure.string/join "-")))))
This will prepend - in the front. If you just want to throw an error, you can use precondition:
(defn num-as-words [n]
{:pre [(<= 0 n)]}
(let [words ["zero" "one" "two" "three" "four" "five" "six" "seven" "eight" "nine"]]
...
This will throw AssertionError when it receives negative number.

Creating a string of blank characters

How would one construct a code that inserts b blank characters in between two input strings?
Concretely I am asking about a function which inputs
(newfn "AAA" "ZZZ" 10)
and outputs 10 blanks between the strings "AAA" and "ZZZ"

(defn wrap-spaces [h t n]
(let [blanks (apply str (repeat n " "))]
(str h blanks t)))
(wrap-spaces "AAA" "ZZZ" 10)

How to convert a clojure string of numbers into separate integers?

I can read some data in like this in the repl. For a real program I plan to assign in a let special form.
(def x1 (line-seq (BufferedReader. (StringReader. x1))))
If I enter 5 5, x1 is bound to ("5 5")
I would like to convert this list of one element into a list of two integers. How can I do that? I have been playing around with parsing the string on whitespace, but am having trouble performing the conversion to integer.

Does this help? In Clojure 1.3.0:
(use ['clojure.string :only '(split)])
(defn str-to-ints
[string]
(map #(Integer/parseInt %)
(split string #" ")))
(str-to-ints "5 4")
; => (5 4)
(apply str-to-ints '("5 4"))
; => (5 4)
In case the Clojure version you're using doesn't have clojure.string namespace you can skip the use command and define the function in a following way.
(defn str-to-ints
[string]
(map #(Integer/parseInt %)
(.split #" " string)))
You can get rid of regular expressions by using (.split string " ") in the last line.

Works for all numbers and returns nil in the case it isn't a number (so you can filter out nils in the resulting seq)
(require '[clojure.string :as string])
(defn parse-number
"Reads a number from a string. Returns nil if not a number."
[s]
(if (re-find #"^-?\d+\.?\d*$" s)
(read-string s)))
E.g.
(map parse-number (string/split "1 2 3 78 90 -12 0.078" #"\s+"))
; => (1 2 3 78 90 -12 0.078)

The string can be wrapped with brackets and after that evaluated as clojure list with read-string function:
(def f #(read-string (str "(" % ")")))
(f "5 4")
; => (5 4)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get all trigrams of a string in clojure - clojure

Suppose I have a string "This is a string". The tri-grams would be "Thi", "his", "is ", "s i" etc. I want to return a vector of all the trim-grams. How can I do that?

Related

Comparing two strings and returning the number of matched words

take-while on clojure.string does not work

Write a function to print (non-negative) integer numbers in full words in Clojure

Creating a string of blank characters

How to convert a clojure string of numbers into separate integers?

Categories

Resources