partition a lazy sequence - after - a predicate truth test changes - clojure

Consider sentences stored in a lazy sequence: Each word is one entry, punctuation however belongs to the words:
("It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!")
It should now be "partitioned" in sentences. I wrote a helper function last-punctuated?, which checks if the last character is a non-alphabet character. (no problem with this)
Desired result:
(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
Everything should stay lazy. Unfortunately I could not use partition-by: This function splits up before the result of a given predicate changes, meaning the punctuated entries are not interpreted as the last entry in the sub sequence.

i would propose using a lazy-seq. Can't think of anything better than this (maybe it's not really the best):
(defn parts [items pred]
(lazy-seq
(when (seq items)
(let [[l r] (split-with (complement pred) items)]
(cons (concat l (take 1 r))
(parts (rest r) pred))))))
in repl:
user> (let [items '("It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(parts items (comp #{\? \! \. \,} last)))
(("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
user> (let [items '("what?" "It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(parts items (comp #{\? \! \. \,} last)))
(("what?") ("It's" "time" "when" "it's" "time!") ("What" "did" "you" "say?") ("Nothing!"))
user> (let [items '("what?" "It's" "time" "when" "it's"
"time!" "What" "did" "you"
"say?" "Nothing!")]
(realized? (parts items (comp #{\? \! \. \,} last))))
false
update: probably the same approach with iterate would be better.
(defn parts [items pred]
(->> [nil items]
(iterate (fn [[_ items]]
(let [[l r] (split-with (complement pred) items)]
[(concat l (take 1 r)) (rest r)])))
rest
(map first)
(take-while seq)))

This problem can be actually expressed easily by generating a new sequence, containing "split tokens" and then doing partition-by basing on a different predicate.:
(def punctuation? #{\. \! \?})
(def words ["It's" "time" "when" "it's" "time!" "What" "did" "you" "say?" "Nothing!"])
(defn partition-sentences [ws]
(->> ws
(mapcat #(if (punctuation? (last %)) [% :br] [%]))
(partition-by #(= :br %))
(take-nth 2)))
(println (take 20 (partition-sentences (repeatedly #(rand-nth words))))

When the size of the input differs from the size of the output often the answer is to use reduce.
(defn last-word? [word]
(assert word)
(or (.endsWith word "!")
(.endsWith word "?")))
(defn make-sentence [in]
(reduce (fn [acc ele]
(let [up-to-current-sentence (vec (butlast acc))
last-word-last-sentence (-> acc last last)
new-sentence? (when last-word-last-sentence (last-word? last-word-last-sentence))
current-sentence (vec (last acc))]
(if new-sentence?
(conj acc [ele])
(conj up-to-current-sentence (conj current-sentence ele)))))
[] in))
Unfortunately reduce needs an end, so can't be made to work with lazy input. There's discussion here.

Related

finding index of vowels in a string clojure

Hi am learning clojure and trying to find the index of the vowels in a string here is what I tried
(def vowels [\a \e \i \o \u \y])
(let [word-index (interleave "aaded" (range))
indexs (for [ [x i] (vector word-index)
:when (some #{x} vowels)]
[i] )]
(seq indexs))
But this is giving me index "0" or nill what am doing wrong.
> (def vowels #{\a \e \i \o \u})
> (filter some? (map #(when (vowels %1) %2) "aaded" (range)))
(0 1 3)
You need to form the input correctly for the for comprehension:
(let [word-index (interleave "aaded" (range))
indexs (for [[x i] (partition 2 word-index)
:when (some #{x} vowels)]
i)]
(prn (seq indexs)))
;; => (0 1 3)
interleave will give a lazy sequence when we mapped that sequence to the vector of for loop, I think I missed the indexes. So changed the implementation as below.
(let [word-index (zipmap (range) "aaded")
indexs (for [ [i x] word-index
:when (some #{x} vowels)]
[i] )
]
(flatten indexs)
)
Which is working fine, if anyone has better implementation please share. It will be helpful for me thanks.
With every iteration of the for function, the same hash-set is formed repeatedly. So it's better to define it in the let block. Also, we can use the hash-set directly as a function and we don't need the some function for the same.
(let [word-index (zipmap (range) "aaded")
vowels-hash (into #{} [\a \e \i \o \u \y])
indexs (for [[i x] word-index
:when (vowels-hash x)]
[i])]
(flatten indexs))
a bit different approach with regex:
for all indices:
user> (let [m (re-matcher #"[aeiou]" "banedif")]
(take-while identity (repeatedly #(when (re-find m) (.start m)))))
;;=> (1 3 5)
for single index:
user> (let [m (re-matcher #"[aeiou]" "bfsendf")]
(when (re-find m) (.start m)))
;;=> 3
user> (let [m (re-matcher #"[aeiou]" "bndf")]
(when (re-find m) (.start m)))
;;=> nil
#jas has got this nailed down already. Adding my own to provide some comments on what happens in intermediary steps.
Use sets to check for membership. Then the question "is this a vowel?" will be fast.
(def vowels (set "aeiouy"))
vowels
;; => #{\a \e \i \o \u \y}
We can filter out the vowels, then get just the indexes
(defn vowel-indices-1 [word]
(->> (map vector (range) word) ; ([0 \h] [1 \e] [2 \l] ...)
(filter (fn [[_ character]] ; ([1 \e] [4 \o])
(contains? vowels character)))
(map first))) ; (1 4)
(vowel-indices-1 "hello!")
;; => (1 4)
... or we can go for a slightly more fancy with the :when keyword (didn't know about that, thanks!), in the style that you started!
(defn vowel-indices-2 [word]
(for [[i ch] (map vector (range) word)
:when (contains? vowels ch)]
i))
(vowel-indices-2 "hello!")
;; => (1 4)

Clojure for loop not returning updated values of atom

I'm trying to write a function that counts the number of vowels and consonants in a given string. The return value is a map with two keys, vowels and consonants. The values for each respective key are simply the counts.
The function that I have been able to develop so far is
(defn count-vowels-consenants [s]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(for [xs s]
(if
(contains? v (str xs))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)
))
#m))
however (count-vowels-consenants "sldkfjlskjwe") returns {"vowels":0 "consenants": 0}
What am I doing wrong?
EDIT: changed my input from str to s as str is a function in Clojure.
I think for is lazy so you're not going to actually do anything until you try to realize it. I added a first onto the for loop which realized the list and resulted in an error which you made by overwriting the str function with the str string. Ideally, you would just do this without the atom rigmarole.
(defn count-vowels-consonants [s]
(let [v #{\a \e \i \o \u}
vowels (filter v s)
consonants (remove v s)]
{:consonants (count consonants)
:vowels (count vowels)}))
if the atom is what you want, then use doseq instead of for and it will update the atom for everything in the string. also make sure you don't overwrite the str function by using it in your function binding.
if this side effecting scheme is inevitable (for sume educational reason, i suppose) just replace for with doseq which is a side effecting eager equivalent of for
(by the way: there is a mistake in your initial code: you use str as an input param name, and then try to use it as a function. So you are shadowing the def from the clojure.core, just try to avoid using params named like the core functions):
(defn count-vowels-consenants [input]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(doseq [s input]
(if (contains? v (str s))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)))
#m))
#'user/count-vowels-consenants
user> (count-vowels-consenants "asdfg")
;; {"vowels" 1, "consenants" 4}
otherwise you could do something like this:
user> (reduce #(update %1
(if (#{\a \e \i \o \u} %2)
"vowels" "consonants")
(fnil inc 0))
{} "qwertyui")
;;{"consonants" 5, "vowels" 3}
or
user> (frequencies (map #(if (#{\a \e \i \o \u} %)
"vowels" "consonants")
"qwertyui"))
;;{"consonants" 5, "vowels" 3}
or this (if you're good with having true/false instead of "vowels/consonants"):
user> (frequencies (map (comp some? #{\a \e \i \o \u}) "qwertyui"))
;;{false 5, true 3}
for is lazy as mentioned by #Brandon H. You can use loop recur if you want. Here I change for with loop-recur.
(defn count-vowels-consenants [input]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(loop [s input]
(when (> (count s) 0)
(if
(contains? v (first (str s) ))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)
))
(recur (apply str (rest s))))
#m))
The question, and every extant answer, assumes that every character is a vowel or a consonant: not so. And even in ASCII, there are lower and upper case letters. I'd do it as follows ...
(defn count-vowels-consonants [s]
(let [vowels #{\a \e \i \o \u
\A \E \I \O \U}
classify (fn [c]
(if (Character/isLetter c)
(if (vowels c) :vowel :consonant)))]
(map-v count (dissoc (group-by classify s) nil))))
... where map-v is a function that map's the values of a map:
(defn map-v [f m] (reduce (fn [a [k v]] (assoc a k (f v))) {} m))
For example,
(count-vowels-consonants "s2a Boo!")
;{:vowel 3, :consonant 2}
This traverses the string just once.

clojure find arbitrarily nested key

Is there an easy way in Clojure (maybe using specter) to filter collections depending on whether the an arbitrarily nested key with a known name contains an element ?
Ex. :
(def coll [{:res [{:a [{:thekey [
"the value I am looking for"
...
]
}
]}
{:res ...}
{:res ...}
]}])
Knowing that :a could have a different name, and that :thekey could be nested somewhere else.
Let's say I would like to do :
#(find-nested :thekey #{"the value I am looking for"} coll) ;; returns a vector containing the first element in coll (and maybe others)
use zippers.
in repl:
user> coll
[{:res [{:a [{:thekey ["the value I am looking for"]}]} {:res 1} {:res 1}]}]
user> (require '[clojure.zip :as z])
nil
user> (def cc (z/zipper coll? seq nil coll))
#'user/cc
user> (loop [x cc]
(if (= (z/node x) :thekey)
(z/node (z/next x))
(recur (z/next x))))
["the value I am looking for"]
update:
this version is flawed, since it doesn't care about :thekey being the key in a map, or just keyword in a vector, so it would give unneeded result for coll [[:thekey [1 2 3]]]. Here is an updated version:
(defn lookup-key [k coll]
(let [coll-zip (z/zipper coll? #(if (map? %) (vals %) %) nil coll)]
(loop [x coll-zip]
(when-not (z/end? x)
(if-let [v (-> x z/node k)] v (recur (z/next x)))))))
in repl:
user> (lookup-key :thekey coll)
["the value I am looking for"]
user> (lookup-key :absent coll)
nil
lets say we have the same keyword somewhere in a vector in a coll:
(def coll [{:res [:thekey
{:a [{:thekey ["the value I am looking for"]}]}
{:res 1} {:res 1}]}])
#'user/coll
user> (lookup-key :thekey coll)
["the value I am looking for"]
which is what we need.

clojure: Efficiently determining if a string begins with any prefix in a collection

I'm have a collection of prefix/value pairs, and wish to find any value in this connection associated with a prefix that my current target string begins with. (It is not important that behavior be defined in the case where more than one prefix matches, as the nature of my use case is such that this should never occur).
A naive (working) implementation follows:
(defn prefix-match [target-str pairs]
(some
(fn [[k v]]
(if (.startsWith target-str k)
v
false))
pairs))
Such that:
user=> (prefix-match "foobar" {"meh" :qux, "foo" :baz})
:baz
This works as intended, but is O(n) with the length of the pairs sequence. (Fast insertion into pairs is also desirable, but not as important as fast lookup).
The first thing that comes to mind is bisecting a sorted collection with efficient random access, but I'm not sure which data structures in Clojure are most appropriate to the task. Suggestions?
How about a trie?
(defn build-trie [seed & kvs]
(reduce
(fn [trie [k v]]
(assoc-in trie (concat k [:val]) v))
seed
(partition 2 kvs)))
(defn prefix-match [target trie]
(when (seq target)
(when-let [node (trie (first target))]
(or (:val node)
(recur (rest target) node)))))
Usage:
user> (def trie (build-trie {} "foo" :baz "meh" :qux))
#'user/trie
user> trie
{\m {\e {\h {:val :qux}}}, \f {\o {\o {:val :baz}}}}
user> (prefix-match "foobar" trie)
:baz
user> (prefix-match "foo" trie)
:baz
user> (prefix-match "f" trie)
nil
user> (prefix-match "abcd" trie)
nil
An efficient, terse approach is to take advantage of rsubseq, which works on any type implementing clojure.lang.Sorted -- which includes sorted-map.
(defn prefix-match [sorted-map target]
(let [[closest-match value] (first (rsubseq sorted-map <= target))]
(if closest-match
(if (.startsWith target closest-match)
value
nil)
nil)))
This passes the relevant tests in my suite:
(deftest prefix-match-success
(testing "prefix-match returns a successful match"
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foobar") :one)
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foo") :one)))
(deftest prefix-match-fail
(testing "prefix-match returns nil on no match"
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "bazqux")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "zzz")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "aaa")))))
It seems simplest to just turn the list of prefixes into a regular expression, and feed those into a regex matcher, which is optimized for exactly this sort of task. Something like
(java.util.regex.Pattern/compile (str "^"
"(?:"
(clojure.string/join "|"
(map #(java.util.regex.Pattern/quote %)
prefixes))
")"))
Should get you a regex suitable for testing against a string (but I haven't tested it at all, so maybe I got some method names wrong or something).
The following solution finds the longest matching prefix and works surprisingly well when the map is huge and strings are relatively short. It tries to match e.g. "foobar", "fooba", "foob", "foo", "fo", "f" in order and returns the first match.
(defn prefix-match
[s m]
(->> (for [end (range (count s) 0 -1)] (.subSequence s 0 end)) ; "foo", "fo", "f"
(map m) ; match "foo", match "fo", ...
(remove nil?) ; ignore unmatched
(first))) ; Take first and longest match

How do I find the index of an item in a vector?

Any ideas what ???? should be? Is there a built in?
What would be the best way to accomplish this task?
(def v ["one" "two" "three" "two"])
(defn find-thing [ thing vectr ]
(????))
(find-thing "two" v) ; ? maybe 1, maybe '(1,3), actually probably a lazy-seq
Built-in:
user> (def v ["one" "two" "three" "two"])
#'user/v
user> (.indexOf v "two")
1
user> (.indexOf v "foo")
-1
If you want a lazy seq of the indices for all matches:
user> (map-indexed vector v)
([0 "one"] [1 "two"] [2 "three"] [3 "two"])
user> (filter #(= "two" (second %)) *1)
([1 "two"] [3 "two"])
user> (map first *1)
(1 3)
user> (map first
(filter #(= (second %) "two")
(map-indexed vector v)))
(1 3)
Stuart Halloway has given a really nice answer in this post http://www.mail-archive.com/clojure#googlegroups.com/msg34159.html.
(use '[clojure.contrib.seq :only (positions)])
(def v ["one" "two" "three" "two"])
(positions #{"two"} v) ; -> (1 3)
If you wish to grab the first value just use first on the result.
(first (positions #{"two"} v)) ; -> 1
EDIT: Because clojure.contrib.seq has vanished I updated my answer with an example of a simple implementation:
(defn positions
[pred coll]
(keep-indexed (fn [idx x]
(when (pred x)
idx))
coll))
(defn find-thing [needle haystack]
(keep-indexed #(when (= %2 needle) %1) haystack))
But I'd like to warn you against fiddling with indices: most often than not it's going to produce less idiomatic, awkward Clojure.
As of Clojure 1.4 clojure.contrib.seq (and thus the positions function) is not available as it's missing a maintainer:
http://dev.clojure.org/display/design/Where+Did+Clojure.Contrib+Go
The source for clojure.contrib.seq/positions and it's dependency clojure.contrib.seq/indexed is:
(defn indexed
"Returns a lazy sequence of [index, item] pairs, where items come
from 's' and indexes count up from zero.
(indexed '(a b c d)) => ([0 a] [1 b] [2 c] [3 d])"
[s]
(map vector (iterate inc 0) s))
(defn positions
"Returns a lazy sequence containing the positions at which pred
is true for items in coll."
[pred coll]
(for [[idx elt] (indexed coll) :when (pred elt)] idx))
(positions #{2} [1 2 3 4 1 2 3 4]) => (1 5)
Available here: http://clojuredocs.org/clojure_contrib/clojure.contrib.seq/positions
I was attempting to answer my own question, but Brian beat me to it with a better answer!
(defn indices-of [f coll]
(keep-indexed #(if (f %2) %1 nil) coll))
(defn first-index-of [f coll]
(first (indices-of f coll)))
(defn find-thing [value coll]
(first-index-of #(= % value) coll))
(find-thing "two" ["one" "two" "three" "two"]) ; 1
(find-thing "two" '("one" "two" "three")) ; 1
;; these answers are a bit silly
(find-thing "two" #{"one" "two" "three"}) ; 1
(find-thing "two" {"one" "two" "two" "three"}) ; nil
Here's my contribution, using a looping structure and returning nil on failure.
I try to avoid loops when I can, but it seems fitting for this problem.
(defn index-of [xs x]
(loop [a (first xs)
r (rest xs)
i 0]
(cond
(= a x) i
(empty? r) nil
:else (recur (first r) (rest r) (inc i)))))
I recently had to find indexes several times or rather I chose to since it was easier than figuring out another way of approaching the problem. Along the way I discovered that my Clojure lists didn't have the .indexOf(Object object, int start) method. I dealt with the problem like so:
(defn index-of
"Returns the index of item. If start is given indexes prior to
start are skipped."
([coll item] (.indexOf coll item))
([coll item start]
(let [unadjusted-index (.indexOf (drop start coll) item)]
(if (= -1 unadjusted-index)
unadjusted-index
(+ unadjusted-index start)))))
We don't need to loop the whole collection if we need the first index. The some function will short circuit after the first match.
(defn index-of [x coll]
(let [idx? (fn [i a] (when (= x a) i))]
(first (keep-indexed idx? coll))))
I'd go with reduce-kv
(defn find-index [pred vec]
(reduce-kv
(fn [_ k v]
(if (pred v)
(reduced k)))
nil
vec))