Finding a substring within a list of strings in clojure

Finding a substring within a list of strings in clojure - clojure

I'd like to find if there is a substring in any of the strings in my list of strings. I have a list like '("hi" "hey" "hello"). Using "some" I can find if the value "hi" is in this list. But how could I find if just "h" was in at least one of the strings in the list?

Clojure solutions:
includes? from clojure.string
(some #(clojure.string/includes? % "h") (list "hi" "hello" "hey"))
contains via Java Interop
(some #(.contains % "h") (list "hi" "hello" "hey"))
ClojureScript solutions:
includes? from clojure.string (require clojure.string in ns form)
(ns my-app.core
(:require [clojure.string :as string]))
(some #(string/includes? % "h") (list "hi" "hello" "hey"))
includes via JavaScript Interop
(some #(.includes % "h") (list "hi" "hello" "hey"))

Related

What is the difference between interpose and clojure.string/join?

is it fair to say that clojure.string/join is a specialized interpos function for strings? Are there any differences between these functions?

Conceptually, they're basically the same. They each take a collection* and a separator, and return a collection where the separator is between each element of the original collection. The major differences between them are:
clojure.string/join calls toString on the separator, and each element of the collection, and uses a StringBuilder to construct the String.
interpose doesn't effect the separator or collection elements, and returns a lazy list* instead of a fully realized String. It's defined in terms of interleave:
(drop 1 (interleave (repeat sep) coll))
Similar concept, but very different implementations.
*I'm ignoring the no-coll version of interpose that returns a transducer.

Interpose
Interpose takes a separator and sequence, and it returns a lazy sequence by adding separator in between the sequence elements
user=> (interpose "-" (list "a" "b" "c"))
("a" "-" "b" "-" "c")
user=> (type (interpose "-" (list "a" "b" "c")))
clojure.lang.LazySeq
Join
Join takes a separator and sequence, and it returns a String by joining elements and separator together.
user=> (clojure.string/join "-" (list "a" "b" "c"))
"a-b-c"
user=> (type (clojure.string/join "-" (list "a" "b" "c")))
java.lang.String

Your intuition is right. Note that interpose can also return a transducer and is lazy.
Just look at the docstrings or source code:
(doc interpose)
(source interpose)
etc.

replace multiple bad characters in clojure

I am trying to replace bad characters from a input string.
Characters should be valid UTF-8 characters (tabs, line breaks etc. are ok).
However I was unable to figure out how to replace all found bad characters.
My solution works for the first bad character.
Usually there are none bad characters. 1/50 cases there is one bad character. I'd just want to make my solution foolproof.
(defn filter-to-utf-8-string
"Return only good utf-8 characters from the input."
[input]
(let [bad-characters (set (re-seq #"[^\p{L}\p{N}\s\p{P}\p{Sc}\+]+" input))
filtered-string (clojure.string/replace input (apply str (first bad-characters)) "")]
filtered-string))
How can I make replace work for all values in sequence not just for the first one?
Friend of mine helped me to find workaround for this problem:
I created a filter for replace using re-pattern.
Within let code is currently
filter (if (not (empty? bad-characters))
(re-pattern (str "[" (clojure.string/join bad-characters) "]"))
#"")
filtered-string (clojure.string/replace input filter "")

Here is a simple version:
(ns xxxxx
(:require
[clojure.string :as str]
))
(def all-chars (str/join (map char (range 32 80))))
(println all-chars)
(def char-L (str/join (re-seq #"[\p{L}]" all-chars)))
(println char-L)
(def char-N (str/join (re-seq #"[\p{N}]" all-chars)))
(println char-N)
(def char-LN (str/join (re-seq #"[\p{L}\p{N}]" all-chars)))
(println char-LN)
all-chars => " !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNO"
char-L => "ABCDEFGHIJKLMNO"
char-N => "0123456789"
char-LN => "0123456789ABCDEFGHIJKLMNO"
So we start off with all ascii chars in the range of 32-80. We first print only the letter, then only the numbers, then either letters or numbers. It seems this should work for your problem, although instead of rejecting non-members of the desired set, we keep the members of the desired set.

Unsure of clojure type

Can anybody explain what the type below is in the code below which I seen in the clojure docs for string/replace?
(clojure.string/replace "The color is red" #"red" "blue")
I am talking specifically about the #"red" "blue"
Also, if I have an array-map like this:
{"red" "blue"}
How could I transform this array-map into this unknown type?
{"red" "blue"} ;=> #"red" "blue"???

If you have a map {"red" "blue"} and you'd like to use that to drive the replacement, you could do:
;; Generic form of your question - uses re-pattern to create a regex
(defn replace-with [s find replacement]
(clojure.string/replace s (re-pattern find) replacement))
;; Walk through every [find replace] pair in replacements map
;; and repeatedly apply it to string
(defn replace-with-all [s replacements]
(reduce (fn [s [f r]] (replace-with s f r))
s
replacements))
(replace-with-all "foo bar baz" {"foo" "blue" "baz" "red"})
;; "blue bar red"

In Clojure, #"....." is a Regular Expression definition. So you are replacing red with blue.
(replace s match replacement)
Replaces all instance of match with replacement in s.
match/replacement can be:
string / string
char / char
pattern / (string or function of match).
But I didn't understand what do you mean by 'transform this array-map into this unknown type'.

How can I get the positions of regex matches in ClojureScript?

In Clojure I could use something like this solution: Compact Clojure code for regular expression matches and their position in string, i.e., creating a re-matcher and extracted the information from that, but re-matcher doesn't appear to be implemented in ClojureScript. What would be a good way to accomplish the same thing in ClojureScript?
Edit:
I ended up writing a supplementary function in order to preserve the modifiers of the regex as it is absorbed into re-pos:
(defn regex-modifiers
"Returns the modifiers of a regex, concatenated as a string."
[re]
(str (if (.-multiline re) "m")
(if (.-ignoreCase re) "i")))
(defn re-pos
"Returns a vector of vectors, each subvector containing in order:
the position of the match, the matched string, and any groups
extracted from the match."
[re s]
(let [re (js/RegExp. (.-source re) (str "g" (regex-modifiers re)))]
(loop [res []]
(if-let [m (.exec re s)]
(recur (conj res (vec (cons (.-index m) m))))
res))))

You can use the .exec method of JS RegExp object. The returned match object contains an index property that corresponds to the index of the match in the string.
Currently clojurescript doesn't support constructing regex literals with the g mode flag (see CLJS-150), so you need to use the RegExp constructor. Here is a clojurescript implementation of the re-pos function from the linked page:
(defn re-pos [re s]
(let [re (js/RegExp. (.-source re) "g")]
(loop [res {}]
(if-let [m (.exec re s)]
(recur (assoc res (.-index m) (first m)))
res))))
cljs.user> (re-pos "\\w+" "The quick brown fox")
{0 "The", 4 "quick", 10 "brown", 16 "fox"}
cljs.user> (re-pos "[0-9]+" "3a1b2c1d")
{0 "3", 2 "1", 4 "2", 6 "1"}

Iterating through a map with doseq

I'm new to Clojure and I'm doing some basic stuff from labrepl, now I want to write a function that will replace certain letters with other letters, for example: elosska → elößkä.
I wrote this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize
[sentence]
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(doseq [[original-letter new-letter] german-letters]
(str/replace sentence original-letter new-letter)))
but it doesn't work as I expect. Could you help me, please?

Here is my take,
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [s]
(reduce (fn[sentence [match replacement]]
(str/replace sentence match replacement)) s german-letters))
(germanize "elosska")

There are 2 problems here:
doseq doesn't preserve head of list that created by its evaluation, so you won't get any results
str/replace works on separate copies of text, producing 4 different results - you can check this by replacing doseq with for and you'll get list with 4 entries.
You code could be rewritten following way:
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [sentence]
(loop [text sentence
letters german-letters]
(if (empty? letters)
text
(let [[original-letter new-letter] (first letters)]
(recur (str/replace text original-letter new-letter)
(rest letters))))))
In this case, intermediate results are collected, so all replacements are applied to same string, producing correct string:
user> (germanize "elosska")
"elößkä"
P.S. it's also not recommended to use def in the function - it's better to use it for top-level forms

Alex has of course already correctly answered the question with respect to the original issue using doseq... but I found the question interesting and wanted to see what a more "functional" solution would look like. And by that I mean without using a loop.
I came up with this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize [sentence]
(let [letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"}
regex (re-pattern (apply str (interpose \| (keys letters))))]
(str/replace sentence regex letters)))
Which yields the same result:
student.dialect=> (germanize "elosska")
"elößkä"
The regex (re-pattern... line simply evaluates to #"ss|a|o|u", which would have been cleaner, and simpler to read, if entered as an explicit string, but I thought it best to have only one definition of the german letters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding a substring within a list of strings in clojure - clojure

I'd like to find if there is a substring in any of the strings in my list of strings. I have a list like '("hi" "hey" "hello"). Using "some" I can find if the value "hi" is in this list. But how could I find if just "h" was in at least one of the strings in the list?

Related

What is the difference between interpose and clojure.string/join?

replace multiple bad characters in clojure

Unsure of clojure type

How can I get the positions of regex matches in ClojureScript?

Iterating through a map with doseq

Categories

Resources