I would like to Parallelize my Clojure implementation - clojure

Ok so i have an algorithm what it does is , it loops through a fill line by line and then looks for a given word in the line. Not only does it return the given word but it also returns a number(given also as a parameter) of words that come before and after that word.
Eg.line = "I am overflowing with blessings and you also are"
parameters = ("you" 2)
output = (blessings and you also are)
(with-open [r (clojure.java.io/reader "resources/small.txt")]
(doseq [l (line-seq r)]
(let [x (topMostLoop l "good" 2)]
(if (not (empty? x))
(println x)))))
the above code is working fine. But i would like to parallelize it so i did this below
(with-open [r (clojure.java.io/reader "resources/small.txt")]
(doseq [l (line-seq r)]
(future
(let [x (topMostLoop l "good" 2)]
(if (not (empty? x))
(println x))))))
but then the outputs comes out all messy. I know I need to lock somewhere but dont know where.
(defn topMostLoop [contents word next]
(let [mywords (str/split contents #"[ ,\\.]+")]
(map (fn [element] (
return-lines (max 0 (- element next))
(min (+ element next) (- (count mywords) 1)) mywords))
(vec ((indexHashMap mywords) word)))))
Please would be glad if someone can help me this is the last thing Im left with.
NB. Do let me know if i need to post the other functions as well
I have added the other functions for more clarity
(defn return-lines [firstItem lastItem contentArray]
(take (+ (- lastItem firstItem) 1)
(map (fn [element] (str element))
(vec (drop firstItem contentArray)))))
(defn indexHashMap [mywords]
(->> (zipmap (range) mywords) ;contents is a list of words
(reduce (fn [index [location word]]
(merge-with concat index {word (list location)})) {})))

First, use map for first example when you are using serial approach:
(with-open [r (clojure.java.io/reader "resources/small.txt")]
(doseq [l (map #(topMostLoop %1 "good" 2) (line-seq r))]
(if (not (empty? l))
(println l))))
With this approach topMostLoop function is applied on each line, and lazy seq of results is returned. In body of doseq function results are printed if not empty.
After that, replace map with pmap, which will run mapping in parallel, and results will appear in same order as given lines:
(with-open [r (clojure.java.io/reader "resources/small.txt")]
(doseq [l (pmap #(topMostLoop %1 "good" 2) (line-seq r))]
(if (not (empty? l))
(println l))))
In your case with futures, results will be normaly out of order (some later futures will finish execution sooner than former futures).
I tested this with following modifications (not reading text file, but creating lazy sequence of vector of numbers, searching for value in vectors and returning surrounding):
(def lines (repeatedly #(shuffle (range 1 11))))
(def lines-10 (take 10 lines))
lines-10
([5 8 3 10 6 9 7 2 1 4]
[6 8 9 7 2 5 10 4 1 3]
[2 7 8 9 1 5 10 3 4 6]
[10 8 3 5 7 2 4 9 6 1]
[8 6 10 1 9 4 3 7 2 5]
[9 6 8 1 5 10 3 4 2 7]
[10 9 3 7 1 8 4 6 5 2]
[6 1 4 10 3 7 8 9 5 2]
[9 6 7 5 8 3 10 4 2 1]
[4 1 5 2 7 3 6 9 8 10])
(defn surrounding
[v value size]
(let [i (.indexOf v value)]
(if (= i -1)
nil
(subvec v (max (- i size) 0) (inc (min (+ i size) (dec (count v))))))))
(doseq [l (map #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil
(doseq [l (pmap #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil

Related

split a sequence by delimiter in clojure?

Say I have a sequence in clojure like
'(1 2 3 6 7 8)
and I want to split it up so that the list splits whenever an element divisible by 3 is encountered, so that the result looks like
'((1 2) (3) (6 7 8))
(EDIT: What I actually need is
[[1 2] [3] [6 7 8]]
, but I'll take the sequence version too : )
What is the best way to do this in clojure?
partition-by is no help:
(partition-by #(= (rem % 3) 0) '(1 2 3 6 7 8))
; => ((1 2) (3 6) (7 8))
split-with is close:
(split-with #(not (= (rem % 3) 0)) '(1 2 3 6 7 8))
; => [(1 2) (3 6 7 8)]
Something like this?
(defn partition-with
[f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [run (cons (first s) (take-while (complement f) (next s)))]
(cons run (partition-with f (seq (drop (count run) s))))))))
(partition-with #(= (rem % 3) 0) [1 2 3 6 7 8 9 12 13 15 16 17 18])
=> ((1 2) (3) (6 7 8) (9) (12 13) (15 16 17) (18))
This is an interesting problem. I recently added a function split-using to the Tupelo library, which seems like a good fit here. I left the spyx debug statements in the code below so you can see how things progress:
(ns tst.clj.core
(:use clojure.test tupelo.test)
(:require
[tupelo.core :as t] ))
(t/refer-tupelo)
(defn start-segment? [vals]
(zero? (rem (first vals) 3)))
(defn partition-using [pred vals-in]
(loop [vals vals-in
result []]
(if (empty? vals)
result
(t/spy-let [
out-first (take 1 vals)
[out-rest unprocessed] (split-using pred (spyx (next vals)))
out-vals (glue out-first out-rest)
new-result (append result out-vals)]
(recur unprocessed new-result)))))
Which gives us output like:
out-first => (1)
(next vals) => (2 3 6 7 8)
[out-rest unprocessed] => [[2] (3 6 7 8)]
out-vals => [1 2]
new-result => [[1 2]]
out-first => (3)
(next vals) => (6 7 8)
[out-rest unprocessed] => [[] [6 7 8]]
out-vals => [3]
new-result => [[1 2] [3]]
out-first => (6)
(next vals) => (7 8)
[out-rest unprocessed] => [[7 8] ()]
out-vals => [6 7 8]
new-result => [[1 2] [3] [6 7 8]]
(partition-using start-segment? [1 2 3 6 7 8]) => [[1 2] [3] [6 7 8]]
or for a larger input vector:
(partition-using start-segment? [1 2 3 6 7 8 9 12 13 15 16 17 18 18 18 3 4 5])
=> [[1 2] [3] [6 7 8] [9] [12 13] [15 16 17] [18] [18] [18] [3 4 5]]
You could also create a solution using nested loop/recur, but that is already coded up in the split-using function:
(defn split-using
"Splits a collection based on a predicate with a collection argument.
Finds the first index N such that (pred (drop N coll)) is true. Returns a length-2 vector
of [ (take N coll) (drop N coll) ]. If pred is never satisified, [ coll [] ] is returned."
[pred coll]
(loop [left []
right (vec coll)]
(if (or (empty? right) ; don't call pred if no more data
(pred right))
[left right]
(recur (append left (first right))
(rest right)))))
Actually, the above function seems like it would be useful in the future. partition-using has now been added to the Tupelo library.
and one more old school reduce-based solution:
user> (defn split-all [pred items]
(when (seq items)
(apply conj (reduce (fn [[acc curr] x]
(if (pred x)
[(conj acc curr) [x]]
[acc (conj curr x)]))
[[] []] items))))
#'user/split-all
user> (split-all #(zero? (rem % 3)) '(1 2 3 6 7 8 10 11 12))
;;=> [[1 2] [3] [6 7 8 10 11] [12]]

Changing 1-3 random index(s) in a sequence to a random value

I would also like the changed value to be random. For example
'(1 2 3 4 5)
one possible output.
'(1 3 3 4 5)
another
'(1 5 5 4 5)
there are more idiomatic ways to do this in clojure. For example this one:
you can generate infinite lazy sequence of random changes to the initial collection, and then just take a random item from it.
(defn random-changes [items limit]
(rest (reductions #(assoc %1 (rand-int (count items)) %2)
(vec items)
(repeatedly #(rand-int limit)))))
in repl:
user> (take 5 (random-changes '(1 2 3 4 5 6 7 8) 100))
([1 2 3 4 5 64 7 8] [1 2 3 4 5 64 58 8] [1 2 3 4 5 64 58 80]
[1 2 3 4 5 28 58 80] [1 2 3 71 5 28 58 80])
user> (nth (random-changes '(1 2 3 4 5 6 7 8) 100) 0)
[1 2 3 64 5 6 7 8]
and you can just take an item at the index you want (so it means collection with index + 1 changes).
user> (nth (random-changes '(1 2 3 4 5 6 7 8) 100) (rand-int 3))
[1 46 3 44 86 6 7 8]
or just use reduce to take the n times changed coll at once:
(defn random-changes [items limit changes-count]
(reduce #(assoc %1 (rand-int (count items)) %2)
(vec items)
(repeatedly changes-count #(rand-int limit))))
in repl:
user> (random-changes [1 2 3 4 5 6] 100 3)
[27 2 33 4 76 6]
also you can just associate all the changes in a vector at once:
(assoc items 0 100 1 200 2 300), so you can do it like that:
(defn random-changes [items limit changes-count]
(let [items (vec items)
rands #(repeatedly changes-count (partial rand-int %))]
(apply assoc items
(interleave (rands (count items))
(rands limit)))))
in repl:
user> (random-changes [1 2 3 4 5 6] 100 3)
[1 65 61 44 5 6]
Figured it out. Decided to go a longer route and make a function.
(defn changeSequence
[sequ x]
(def transsequ (into [] sequ))
(if (> x 0)
(changeSequence (assoc transsequ (rand-int (count transsequ)) (rand-int foo)) (dec x))
(seq sequ)
))

What is an idiomatic way to implement double loop over a vector in Clojure?

I am new to Clojure and it's hard for me to idiomatically implement basic manipulations with data structures.
What would be an idiomatic way to implement the following code in Clojure?
l = [...]
for i in range(len(l)):
for j in range(i + 1, len(l)):
print l[i], l[j]
the simplest (but not the most FP-ish) is almost identical to your example:
(let [v [1 2 3 4 5 6 7]]
(doseq [i (range (count v))
j (range (inc i) (count v))]
(println (v i) (v j))))
and here is more functional variant to generate all these pairs (it isn't based on length or indices, but rather on the tail iteration):
(let [v [1 2 3 4 5 6 7]]
(mapcat #(map (partial vector (first %)) (rest %))
(take-while not-empty (iterate rest v))))
output:
([1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [2 3] [2 4]
[2 5] [2 6] [2 7] [3 4] [3 5] [3 6] [3 7] [4 5]
[4 6] [4 7] [5 6] [5 7] [6 7])
then just use these pairs in doseq for any side effect:
(let [v [1 2 3 4 5 6 7]
pairs (fn [items-seq]
(mapcat #(map (partial vector (first %)) (rest %))
(take-while not-empty (iterate rest items-seq))))]
(doseq [[i1 i2] (pairs v)] (println i1 i2)))
update: following #dg123's answer. it is nice, but you can make it even better, using doseq's and for's features like destructuring and guards:
(let [v [1 2 3 4 5 6 7]]
(doseq [[x & xs] (iterate rest v)
:while xs
y xs]
(println "x:" x "y:" y)))
you iterate through the tails of a collection, but remember, that iterate produces an infinite coll:
user> (take 10 (iterate rest [1 2 3 4 5 6 7]))
([1 2 3 4 5 6 7] (2 3 4 5 6 7) (3 4 5 6 7)
(4 5 6 7) (5 6 7) (6 7) (7) () () ())
so you have to limit it somehow to include just not empty collections.
the destructuring form [x & xs] splits the argument to a first param and the sequence of the rest params:
user> (let [[x & xs] [1 2 3 4 5 6]]
(println x xs))
1 (2 3 4 5 6)
nil
and when the binded collection is empty, or have a single element, the xs would be nil:
user> (let [[x & xs] [1]]
(println x xs))
1 nil
nil
so you just make use of this feature, using :while guard in a list comprehension.
in the end you just construct pairs (or do some side effect in this case) for x and every item in xs
How about using map vector and iterate:
user=> (def l [1 2 3 4 5])
#'user/l
user=> (map vector l (iterate rest (drop 1 l)))
([1 (2 3 4 5)] [2 (3 4 5)] [3 (4 5)] [4 (5)] [5 ()])
which produces a lazy sequence of the value of each i index followed by all of its js.
You can then iterate over all of the pairs of values you need using for like so:
user=> (for [[i js] (map vector l (iterate rest (drop 1 l)))
j js]
[i j])
([1 2] [1 3] [1 4] [1 5] [2 3] [2 4] [2 5] [3 4] [3 5] [4 5])
Use doseq if you would like to perform IO instead of producing a lazy sequence:
user=> (doseq [[i js] (map vector l (iterate rest (drop 1 l)))
j js]
(println (str "i: " i " j: " j)))
i: 1 j: 2
i: 1 j: 3
i: 1 j: 4
i: 1 j: 5
i: 2 j: 3
i: 2 j: 4
i: 2 j: 5
i: 3 j: 4
i: 3 j: 5
i: 4 j: 5
nil

clojure: partition a seq based on a seq of values

I would like to partition a seq, based on a seq of values
(partition-by-seq [3 5] [1 2 3 4 5 6])
((1 2 3)(4 5)(6))
The first input is a seq of split points.
The second input is a seq i would like to partition.
So, that the first list will be partitioned at the value 3 (1 2 3) and the second partition will be (4 5) where 5 is the next split point.
another example:
(partition-by-seq [3] [2 3 4 5])
result: ((2 3)(4 5))
(partition-by-seq [2 5] [2 3 5 6])
result: ((2)(3 5)(6))
given: the first seq (split points) is always a subset of the second input seq.
I came up with this solution which is lazy and quite (IMO) straightforward.
(defn part-seq [splitters coll]
(lazy-seq
(when-let [s (seq coll)]
(if-let [split-point (first splitters)]
; build seq until first splitter
(let [run (cons (first s) (take-while #(<= % split-point) (next s)))]
; build the lazy seq of partitions recursively
(cons run
(part-seq (rest splitters) (drop (count run) s))))
; just return one partition if there is no splitter
(list coll)))))
If the split points are all in the sequence:
(part-seq [3 5 8] [0 1 2 3 4 5 6 7 8 9])
;;=> ((0 1 2 3) (4 5) (6 7 8) (9))
If some split points are not in the sequence
(part-seq [3 5 8] [0 1 2 4 5 6 8 9])
;;=> ((0 1 2) (4 5) (6 8) (9))
Example with some infinite sequences for the splitters and the sequence to split.
(take 5 (part-seq (iterate (partial + 3) 5) (range)))
;;=> ((0 1 2 3 4 5) (6 7 8) (9 10 11) (12 13 14) (15 16 17))
the sequence to be partitioned is a splittee and the elements of split-points (aka. splitter) marks the last element of a partition.
from your example:
splittee: [1 2 3 4 5 6]
splitter: [3 5]
result: ((1 2 3)(4 5)(6))
Because the resulting partitions is always a increasing integer sequence and increasing integer sequence of x can be defined as start <= x < end, the splitter elements can be transformed into end of a sequence according to the definition.
so, from [3 5], we want to find subsequences ended with 4 and 6.
then by adding the start, the splitter can be transformed into sequences of [start end]. The start and end of the splittee is also used.
so, the splitter [3 5] then becomes:
[[1 4] [4 6] [6 7]]
splitter transformation could be done like this
(->> (concat [(first splittee)]
(mapcat (juxt inc inc) splitter)
[(inc (last splittee))])
(partition 2)
there is a nice symmetry between transformed splitter and the desired result.
[[1 4] [4 6] [6 7]]
((1 2 3) (4 5) (6))
then the problem becomes how to extract subsequences inside splittee that is ranged by [start end] inside transformed splitter
clojure has subseq function that can be used to find a subsequence inside ordered sequence by start and end criteria. I can just map the subseq of splittee for each elements of transformed-splitter
(map (fn [[x y]]
(subseq (apply sorted-set splittee) <= x < y))
transformed-splitter)
by combining the steps above, my answer is:
(defn partition-by-seq
[splitter splittee]
(->> (concat [(first splittee)]
(mapcat (juxt inc inc) splitter)
[(inc (last splittee))])
(partition 2)
(map (fn [[x y]]
(subseq (apply sorted-set splittee) <= x < y)))))
This is the solution i came up with.
(def a [1 2 3 4 5 6])
(def p [2 4 5])
(defn partition-by-seq [s input]
(loop [i 0
t input
v (transient [])]
(if (< i (count s))
(let [x (split-with #(<= % (nth s i)) t)]
(recur (inc i) (first (rest x)) (conj! v (first x))))
(do
(conj! v t)
(filter #(not= (count %) 0) (persistent! v))))))
(partition-by-seq p a)

How to check if current value is greater than the next value in Clojure?

I am looking for a way to check if a current value in a collection is greater than the next value, and if so, add that pair of items to a collection eg:
[9 2 3 7 11 8 3 7 1] => [9 2 11 8 8 3 7 1] ; Checking each item against the next
I initially thought I could do something like:
(filter (fn [[x y]] (> x y)) [9 2 3 7 11 8 3 7 1])
But something like this seemed to work only with associative types. So then I tried something like this:
(defn get-next [col index] ; Returns next item from a collection
(get col (inc (.indexOf col index))))
(filter (fn [[x]] (> x (get-next [9 2 3 7 11 8 3 7 1] x))) [9 2 3 7 11 8 3 7 1])
But still I got the same error. Any help would be appreciated
Use partition function to make pair of current and next item in a collection.
user=> (partition 2 1 [9 2 3 7 11 8 3 7 1])
((9 2) (2 3) (3 7) (7 11) (11 8) (8 3) (3 7) (7 1))
Now you have pair of current and next item in the collection. you can compare the items in each pair and concat the result with mapcat.
user=> (->> [9 2 3 7 11 8 3 7 1]
#_=> (partition 2 1)
#_=> (mapcat (fn [[a b]] (if (> a b) [a b]))))
(9 2 11 8 8 3 7 1)
Another way is to use reduce:
(defn pairs [data]
((reduce (fn [res item]
(if (and (:prev res) (< item (:prev res)))
(assoc res
:prev item
:res (conj (:res res) (:prev res) item))
(assoc res :prev item)))
{:prev nil :res []} data) :res))
(pairs [9 2 3 7 11 8 3 7 1])
;; [9 2 11 8 8 3 7 1]