Get string indices from the result of re-seq - clojure

I'm currently using re-seq to find the matches of comments inside a piece of java source code.
(re-seq #"(?:/\*(?:[^*]|(?:\*+[^*/]))*\*+/)|(?://.*)" code)
How can I get the index / indices of the matches in the original string code? i.e. To find the starting (and ending) point of the original string code.

You can modify re-seq with the requisite Java interop:
(defn re-seq-pos [pattern string]
(let [m (re-matcher pattern string)]
((fn step []
(when (. m find)
(cons {:start (. m start) :end (. m end) :group (. m group)}
(lazy-seq (step))))))))
Example
(re-seq-pos #"\w+" "foo bar baz") ;=>
({:start 0, :end 3, :group "foo"}
{:start 4, :end 7, :group "bar"}
{:start 8, :end 11, :group "baz"})

Related

Clojure return value from a loop

I want to calculate intersection points. This works well, but I want to store the points in a vector that the function should return.
Here is my code:
(defn intersections
[polygon line]
(let [[p1 p2] line
polypoints (conj polygon (first polygon))]
(doseq [x (range (- (count polypoints) 1))]
(println (intersect p1 p2 (nth polypoints x) (nth polypoints (+ x 1))))
)))
Instead of println I want to add the result to a new vector that should be returned. How can I change it?
You need to use a for loop. The doseq function is meant for side-effects only and always returns nil. An example:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn intersect-1
[numbers]
(let [data-vec (vec numbers)]
(vec
(for [i (range (dec (count numbers)))]
{:start (nth data-vec i)
:stop (nth data-vec (inc i))}))))
The above way works, as seen by the unit test:
(dotest
(is= (intersect-1 (range 5))
[{:start 0, :stop 1}
{:start 1, :stop 2}
{:start 2, :stop 3}
{:start 3, :stop 4}])
However, it is more natural to write it like so in Clojure:
(defn intersect-2
[numbers]
(let [pairs (partition 2 1 numbers)]
(vec
(for [[start stop] pairs]
{:start start :stop stop} ))))
With the same result
(is= (intersect-2 (range 5))
[{:start 0, :stop 1}
{:start 1, :stop 2}
{:start 2, :stop 3}
{:start 3, :stop 4}]))
You can get more details on my favorite template project (including a big documentation list!). See especially the Clojure CheatSheet!
Side note: The vec is optional in both versions. This just forces the answer into a Clojure vector (instead of a "lazy seq"), which is easier to cut and paste in examples and unit tests.
Instead of for-loop, a map would be more idiomatic.
(defn intersections
[polygon line]
(let [[p1 p2] line]
(vec (map (fn [pp1 pp2] (intersect p1 p2 pp1 pp2)) polygon (cdr polygon)))))
or:
(defn intersections
[polygon line]
(let [[p1 p2] line]
(vec (map #(intersect p1 p2 %1 %2) polygon (cdr polygon)))))

clojure - contains?, conj and recur

I'm trying to write a function with recur that cut the sequence as soon as it encounters a repetition ([1 2 3 1 4] should return [1 2 3]), this is my function:
(defn cut-at-repetition [a-seq]
(loop[[head & tail] a-seq, coll '()]
(if (empty? head)
coll
(if (contains? coll head)
coll
(recur (rest tail) (conj coll head))))))
The first problem is with the contains? that throws an exception, I tried replacing it with some but with no success. The second problem is in the recur part which will also throw an exception
You've made several mistakes:
You've used contains? on a sequence. It only works on associative
collections. Use some instead.
You've tested the first element of the sequence (head) for empty?.
Test the whole sequence.
Use a vector to accumulate the answer. conj adds elements to the
front of a list, reversing the answer.
Correcting these, we get
(defn cut-at-repetition [a-seq]
(loop [[head & tail :as all] a-seq, coll []]
(if (empty? all)
coll
(if (some #(= head %) coll)
coll
(recur tail (conj coll head))))))
(cut-at-repetition [1 2 3 1 4])
=> [1 2 3]
The above works, but it's slow, since it scans the whole sequence for every absent element. So better use a set.
Let's call the function take-distinct, since it is similar to take-while. If we follow that precedent and make it lazy, we can do it thus:
(defn take-distinct [coll]
(letfn [(td [seen unseen]
(lazy-seq
(when-let [[x & xs] (seq unseen)]
(when-not (contains? seen x)
(cons x (td (conj seen x) xs))))))]
(td #{} coll)))
We get the expected results for finite sequences:
(map (juxt identity take-distinct) [[] (range 5) [2 3 2]]
=> ([[] nil] [(0 1 2 3 4) (0 1 2 3 4)] [[2 3 2] (2 3)])
And we can take as much as we need from an endless result:
(take 10 (take-distinct (range)))
=> (0 1 2 3 4 5 6 7 8 9)
I would call your eager version take-distinctv, on the map -> mapv precedent. And I'd do it this way:
(defn take-distinctv [coll]
(loop [seen-vec [], seen-set #{}, unseen coll]
(if-let [[x & xs] (seq unseen)]
(if (contains? seen-set x)
seen-vec
(recur (conj seen-vec x) (conj seen-set x) xs))
seen-vec)))
Notice that we carry the seen elements twice:
as a vector, to return as the solution; and
as a set, to test for membership of.
Two of the three mistakes were commented on by #cfrick.
There is a tradeoff between saving a line or two and making the logic as simple & explicit as possible. To make it as obvious as possible, I would do it something like this:
(defn cut-at-repetition
[values]
(loop [remaining-values values
result []]
(if (empty? remaining-values)
result
(let [found-values (into #{} result)
new-value (first remaining-values)]
(if (contains? found-values new-value)
result
(recur
(rest remaining-values)
(conj result new-value)))))))
(cut-at-repetition [1 2 3 1 4]) => [1 2 3]
Also, be sure to bookmark The Clojure Cheatsheet and always keep a browser tab open to it.
I'd like to hear feedback on this utility function which I wrote for myself (uses filter with stateful pred instead of a loop):
(defn my-distinct
"Returns distinct values from a seq, as defined by id-getter."
[id-getter coll]
(let [seen-ids (volatile! #{})
seen? (fn [id] (if-not (contains? #seen-ids id)
(vswap! seen-ids conj id)))]
(filter (comp seen? id-getter) coll)))
(my-distinct identity "abracadabra")
; (\a \b \r \c \d)
(->> (for [i (range 50)] {:id (mod (* i i) 21) :value i})
(my-distinct :id)
pprint)
; ({:id 0, :value 0}
; {:id 1, :value 1}
; {:id 4, :value 2}
; {:id 9, :value 3}
; {:id 16, :value 4}
; {:id 15, :value 6}
; {:id 7, :value 7}
; {:id 18, :value 9})
Docs of filter says "pred must be free of side-effects" but I'm not sure if it is ok in this case. Is filter guaranteed to iterate over the sequence in order and not for example take skips forward?

Clojure for loop not returning updated values of atom

I'm trying to write a function that counts the number of vowels and consonants in a given string. The return value is a map with two keys, vowels and consonants. The values for each respective key are simply the counts.
The function that I have been able to develop so far is
(defn count-vowels-consenants [s]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(for [xs s]
(if
(contains? v (str xs))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)
))
#m))
however (count-vowels-consenants "sldkfjlskjwe") returns {"vowels":0 "consenants": 0}
What am I doing wrong?
EDIT: changed my input from str to s as str is a function in Clojure.
I think for is lazy so you're not going to actually do anything until you try to realize it. I added a first onto the for loop which realized the list and resulted in an error which you made by overwriting the str function with the str string. Ideally, you would just do this without the atom rigmarole.
(defn count-vowels-consonants [s]
(let [v #{\a \e \i \o \u}
vowels (filter v s)
consonants (remove v s)]
{:consonants (count consonants)
:vowels (count vowels)}))
if the atom is what you want, then use doseq instead of for and it will update the atom for everything in the string. also make sure you don't overwrite the str function by using it in your function binding.
if this side effecting scheme is inevitable (for sume educational reason, i suppose) just replace for with doseq which is a side effecting eager equivalent of for
(by the way: there is a mistake in your initial code: you use str as an input param name, and then try to use it as a function. So you are shadowing the def from the clojure.core, just try to avoid using params named like the core functions):
(defn count-vowels-consenants [input]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(doseq [s input]
(if (contains? v (str s))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)))
#m))
#'user/count-vowels-consenants
user> (count-vowels-consenants "asdfg")
;; {"vowels" 1, "consenants" 4}
otherwise you could do something like this:
user> (reduce #(update %1
(if (#{\a \e \i \o \u} %2)
"vowels" "consonants")
(fnil inc 0))
{} "qwertyui")
;;{"consonants" 5, "vowels" 3}
or
user> (frequencies (map #(if (#{\a \e \i \o \u} %)
"vowels" "consonants")
"qwertyui"))
;;{"consonants" 5, "vowels" 3}
or this (if you're good with having true/false instead of "vowels/consonants"):
user> (frequencies (map (comp some? #{\a \e \i \o \u}) "qwertyui"))
;;{false 5, true 3}
for is lazy as mentioned by #Brandon H. You can use loop recur if you want. Here I change for with loop-recur.
(defn count-vowels-consenants [input]
(let [m (atom {"vowels" 0 "consenants" 0})
v #{"a" "e" "i" "o" "u"}]
(loop [s input]
(when (> (count s) 0)
(if
(contains? v (first (str s) ))
(swap! m update-in ["vowels"] inc)
(swap! m update-in ["consenants"] inc)
))
(recur (apply str (rest s))))
#m))
The question, and every extant answer, assumes that every character is a vowel or a consonant: not so. And even in ASCII, there are lower and upper case letters. I'd do it as follows ...
(defn count-vowels-consonants [s]
(let [vowels #{\a \e \i \o \u
\A \E \I \O \U}
classify (fn [c]
(if (Character/isLetter c)
(if (vowels c) :vowel :consonant)))]
(map-v count (dissoc (group-by classify s) nil))))
... where map-v is a function that map's the values of a map:
(defn map-v [f m] (reduce (fn [a [k v]] (assoc a k (f v))) {} m))
For example,
(count-vowels-consonants "s2a Boo!")
;{:vowel 3, :consonant 2}
This traverses the string just once.

Clojure: idiomatic code for frequency map

Let's make frequency map:
(reduce #(update-in %1 [%2] (fnil inc 0)) {} ["a" "b" "a" "c" "c" "a"])
My concern is expression inside lambda #(...) - is it the canonical way to do it? Can I do it better/shorter?
EDIT: Another way I found:
(reduce #(assoc %1 %2 (inc %1 %2 0)) {} ["a" "b" "a" "c" "c" "a"])
Seems like very similar, what are pros/cons? Performance?
Since Clojure 1.2, there is a frequencies function in clojure.core:
user=> (doc frequencies)
-------------------------
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times
they appear.
Example:
user=> (frequencies ["a" "b" "a" "c" "c" "a"])
{"a" 3, "b" 1, "c" 2}
It happens to use transients and ternary get; see (source frequencies) for the code, which is as idiomatic as it gets while being highly performance-aware.
There is no need to use update-in. My way would be:
(defn frequencies [coll]
(reduce (fn [m e]
(assoc m e (inc (m e 0))))
{} coll))
Update: I assumed you knew frequencies was in core also and this was just an exercise.
I did a guest lecture a while ago in which I explained how you can get to this solution step by step. Won't be much new for you, since you were already close to the core solution, but maybe it is of value to someone else reading this question. The slides are in Dutch. If you change .html to .org it's easier to get the source code:
http://michielborkent.nl/gastcollege-han-20-06-2013/gastcollege.html
http://michielborkent.nl/gastcollege-han-20-06-2013/gastcollege.org
Another approach using only 'assoc' and recursion:
(defn my-frequencies-helper [freqs a-seq]
(if (empty? a-seq)
freqs
(let [fst (first a-seq)
new_set (if (contains? freqs fst)
(assoc freqs fst (inc (get freqs fst)))
(assoc freqs fst 1))]
(my-frequencies-helper new_set (rest a-seq)))))
(defn my-frequencies [a-seq]
(my-frequencies-helper {} a-seq))
(my-frequencies [1 1 2 2 :D :D :D])
=> {1 2, 2 2, :D 3}
(my-frequencies [:a "moi" :a "moi" "moi" :a 1])
=> {:a 3, "moi" 3, 1 1}

Compact Clojure code for regular expression matches and their position in string

Stuart Halloway gives the example
(re-seq #"\w+" "The quick brown fox")
as the natural method for finding matches of regex matches in Clojure. In his book this construction is contrasted with iteration over a matcher. If all one cared about were a list of matches this would be great. However, what if I wanted matches and their position within the string? Is there a better way of doing this that allows me to leverage the existing functionality in java.util.regex with resorting to something like a sequence comprehension over each index in the original string? In other words, one would like to type something like
(re-seq-map #"[0-9]+" "3a1b2c1d")
which would return a map with keys as the position and values as the matches, e.g.
{0 "3", 2 "1", 4 "2", 6 "1"}
Is there some implementation of this in an extant library already or shall I write it (shouldn't be too may lines of code)?
You can fetch the data you want out of a java.util.regex.Matcher object.
user> (defn re-pos [re s]
(loop [m (re-matcher re s)
res {}]
(if (.find m)
(recur m (assoc res (.start m) (.group m)))
res)))
#'user/re-pos
user> (re-pos #"\w+" "The quick brown fox")
{16 "fox", 10 "brown", 4 "quick", 0 "The"}
user> (re-pos #"[0-9]+" "3a1b2c1d")
{6 "1", 4 "2", 2 "1", 0 "3"}
You can apply any function to the java.util.regex.Matcher object and return its results (simmilar to Brian's solution, but without explicit loop):
user=> (defn re-fun
[re s fun]
(let [matcher (re-matcher re s)]
(take-while some? (repeatedly #(if (.find matcher) (fun matcher) nil)))))
#'user/re-fun
user=> (defn fun1 [m] (vector (.start m) (.end m)))
#'user/fun1
user=> (re-fun #"[0-9]+" "3a1b2c1d" fun1)
([0 1] [2 3] [4 5] [6 7])
user=> (defn re-seq-map
[re s]
(into {} (re-fun re s #(vector (.start %) (.group %)))))
user=> (re-seq-map #"[0-9]+" "3a1b2c1d")
{0 "3", 2 "1", 4 "2", 6 "1"}