Idiomatic way to use for, while still maintaining high performance - clojure

I have a map that is sorted by its keys which contains data like this:
(def h {50 Text1
70 Text2
372 Text1
391 Text2
759 Text1
778 Text2
})
The map is sorted by Keys. The key (the number) can be interpreted as the position where the corresponding value was found in a large block of text. In the above example, "Text1" was found at position 50 in the text.
Now, I want to find all Texts that were found within k positions of each other. I define a function a like this:
(defn nearest [m k]
(for [m1 (keys m) m2 (keys m)
:when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))]
[m1 (get m m1) m2 (get m m2)]))
(nearest h 50)
; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]
This works, but is too slow when the map m has 100s of thousands of elements. Because the for loop actually looks at all pairs of elements in the map. Since the map is sorted, for each element in the map, it is not necessary to check further elements, once the next element is already beyond k characters. I was able to write a version using loop and recur. But it is kind of unreadable. Is there a more natural way to do this using for? I am assuming for (:while ) should do the trick, but was not able to find a way.
(defn nearest-quick [m k]
(let [m1 (keys m) m2 (keys m)]
(loop [inp m res [] i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)]
(cond
(nil? i) res
(nil? j)(recur inp res (first m1) (rest m1) j m2)
(= i j) (recur inp res i m1 (first m2) (rest m2))
(< j i) (recur inp res i m1 (first m2) (rest m2))
(= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2))
(< (- j i) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2))
(>= (- j i) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))
Note: with a map with 42K elements, the first version takes 90 mins and the second version takes 3 mins.

One could probably exploit subseq when the map is a sorted-map.
(defn nearest
[m n]
(for [[k v] m
[nk nv] (subseq m < k < (+ k n))
:when (not= v nv)]
[k v nk nv]))
Code not benchmarked.

Clojure's for also has a :while modifier, so you can stop the iteration with a condition.

From whatever I have understood from you example:
(def h (sorted-map 50 "Text1"
70 "Text2"
372 "Text1"
391 "Text2"
759 "Text1"
778 "Text2"))
(->> (map #(-> [%1 %2]) h (rest h))
(filter (fn [[[a b] [x y]]] (< (- x a) 50)))
(map flatten))

Related

How to return a lazy sequence from a loop recur with a conditional in Clojure?

Still very new to Clojure and programming in general so forgive the stupid question.
The problem is:
Find n and k such that the sum of numbers up to n (exclusive) is equal to the sum of numbers from n+1 to k (inclusive).
My solution (which works fine) is to define the following functions:
(defn addd [x] (/ (* x (+ x 1)) 2))
(defn sum-to-n [n] (addd(- n 1)))
(defn sum-to-k [n=1 k=4] (- (addd k) (addd n)))
(defn is-right[n k]
(= (addd (- n 1)) (sum-to-k n k)))
And then run the following loop:
(loop [n 1 k 2]
(cond
(is-right n k) [n k]
(> (sum-to-k n k) (sum-to-n n) )(recur (inc n) k)
:else (recur n (inc k))))
This only returns one answer but if I manually set n and k I can get different values. However, I would like to define a function which returns a lazy sequence of all values so that:
(= [6 8] (take 1 make-seq))
How do I do this as efficiently as possible? I have tried various things but haven't had much luck.
Thanks
:Edit:
I think I came up with a better way of doing it, but its returning 'let should be a vector'. Clojure docs aren't much help...
Heres the new code:
(defn calc-n [n k]
(inc (+ (* 2 k) (* 3 n))))
(defn calc-k [n k]
(inc (+ (* 3 k)(* 4 n))))
(defn f
(let [n 4 k 6]
(recur (calc-n n k) (calc-k n k))))
(take 4 (f))
Yes, you can create a lazy-seq, so that the next iteration will take result of the previous iteration. Here is my suggestion:
(defn cal [n k]
(loop [n n k k]
(cond
(is-right n k) [n k]
(> (sum-to-k n k) (sum-to-n n) )(recur (inc n) k)
:else (recur n (inc k)))))
(defn make-seq [n k]
(if-let [[n1 k1] (cal n k)]
(cons [n1 k1]
(lazy-seq (make-seq (inc n1) (inc k1))))))
(take 5 (make-seq 1 2))
;;=> ([6 8] [35 49] [204 288] [1189 1681] [6930 9800])
just generating lazy seq of candidatess with iterate and then filtering them should probably be what you need:
(def pairs
(->> [1 2]
(iterate (fn [[n k]]
(if (< (sum-to-n n) (sum-n-to-k n k))
[(inc n) k]
[n (inc k)])))
(filter (partial apply is-right))))
user> (take 5 pairs)
;;=> ([6 8] [35 49] [204 288] [1189 1681] [6930 9800])
semantically it is just like manually generating a lazy-seq, and should be as efficient, but this one is probably more idiomatic
If you don't feel like "rolling your own", here is an alternate solution. I also cleaned up the algorithm a bit through renaming/reformating.
The main difference is that you treat your loop-recur as an infinite loop inside of the t/lazy-gen form. When you find a value you want to keep, you use the t/yield expression to create a lazy-sequence of outputs. This structure is the Clojure version of a generator function, just like in Python.
(ns tst.demo.core
(:use tupelo.test )
(:require [tupelo.core :as t] ))
(defn integrate-to [x]
(/ (* x (+ x 1)) 2))
(defn sum-to-n [n]
(integrate-to (- n 1)))
(defn sum-n-to-k [n k]
(- (integrate-to k) (integrate-to n)))
(defn sums-match[n k]
(= (sum-to-n n) (sum-n-to-k n k)))
(defn recur-gen []
(t/lazy-gen
(loop [n 1 k 2]
(when (sums-match n k)
(t/yield [n k]))
(if (< (sum-to-n n) (sum-n-to-k n k))
(recur (inc n) k)
(recur n (inc k))))))
with results:
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
(take 5 (recur-gen)) => ([6 8] [35 49] [204 288] [1189 1681] [6930 9800])
You can find all of the details in the Tupelo Library.
This first function probably has a better name from math, but I don't know math very well. I'd use inc (increment) instead of (+ ,,, 1), but that's just personal preference.
(defn addd [x]
(/ (* x (inc x)) 2))
I'll slightly clean up the spacing here and use the dec (decrement) function.
(defn sum-to-n [n]
(addd (dec n)))
(defn sum-n-to-k [n k]
(- (addd k) (addd n)))
In some languages predicates, functions that return booleans,
have names like is-odd or is-whatever. In clojure they're usually
called odd? or whatever?.
The question-mark is not syntax, it's just part of the name.
(defn matching-sums? [n k]
(= (addd (dec n)) (sum-n-to-k n k)))
The loop special form is kind of like an anonymous function
for recur to jump back to. If there's no loop form, recur jumps back
to the enclosing function.
Also, dunno what to call this so I'll just call it f.
(defn f [n k]
(cond
(matching-sums? n k) [n k]
(> (sum-n-to-k n k) (sum-to-n n)) (recur (inc n) k)
:else (recur n (inc k))))
(comment
(f 1 2) ;=> [6 8]
(f 7 9) ;=> [35 49]
)
Now, for your actual question. How to make a lazy sequence. You can use the lazy-seq macro, like in minhtuannguyen's answer, but there's an easier, higher level way. Use the iterate function. iterate takes a function and a value and returns an infinite sequence of the value followed by calling the function with the value, followed by calling the function on that value etc.
(defn make-seq [init]
(iterate (fn [n-and-k]
(let [n (first n-and-k)
k (second n-and-k)]
(f (inc n) (inc k))))
init))
(comment
(take 4 (make-seq [1 2])) ;=> ([1 2] [6 8] [35 49] [204 288])
)
That can be simplified a bit by using destructuring in the argument-vector of the anonymous function.
(defn make-seq [init]
(iterate (fn [[n k]]
(f (inc n) (inc k)))
init))
Edit:
About the repeated calculations in f.
By saving the result of the calculations using a let, you can avoid calculating addd multiple times for each number.
(defn f [n k]
(let [to-n (sum-to-n n)
n-to-k (sum-n-to-k n k)]
(cond
(= to-n n-to-k) [n k]
(> n-to-k to-n) (recur (inc n) k)
:else (recur n (inc k)))))

Save repeated computation in conditional expression in Clojure?

In the following code, I need to compute the maximum of the sum of one element from keyboards and one from drives, subject that the sum should be less or equal to s.
(def s 10)
(def keyboards '(3 1))
(def drives '(5 2 8))
(let [k (sort (fn [x y] (> x y)) keyboards) ; sort into decreasing
d (sort (fn [x y] (> x y)) drives) ; sort into decreasing
]
(loop [k1 (first k) ks (rest k) d1 (first d) ds (rest d)]
(cond
(or (nil? k1) (nil? d1)) -1 ; when one of the list is empty
(< (+ k1 d1) s) (+ k1 d1) ; whether (+ k1 d1) can be saved to compute once?
(and (empty? ks) (empty? ds)) -1
(empty? ks) (if (< (+ k1 (first ds)) s) (+ k1 (first ds)) -1) ; whether (+ k1 (first ds)) can be saved once?
(empty? ds) (if (< (+ d1 (first ks)) s) (+ d1 (first ks)) -1) ; whether (+ d1 (first ks)) can be saved once?
:else (let [bs (take-while #(< % s) [ (+ k1 (first ds)) (+ (first ks) d1) ])]
(if (empty? bs) (recur (first ks) (rest ks) (first ds) (rest ds))
(apply max bs))))))
As indicated in the comments, I wonder if there is any way to further optimize the repeated add operation in the conditional expressions.
It may not be optimal to use let bindings to compute them all before the condition checkings, as only one of the condition would be true, thus the computations for the other conditions would be wasted.
I wonder if Clojure compiler would be smart enough to optimize the repeated computation for me, or there is a clever expression to make the operation to be performed only once in both the checking and return value?
Any suggestion to make the code more idiomatic would be appreciated.
This sounds kind of like the knapsack problem. There are more computationally efficient ways to compute it, but if you are dealing with two or three small lists which are less than a few hundred, and if it is not a critical piece of code that is running in a hot loop, consider the much simpler:
(let [upper-limit 10
keyboards [3 1]
drives [5 2 8]]
(apply max
(for [k keyboards
d drives
:let [sum (+ k d)]
:when (<= sum upper-limit)]
sum)))
You perform your (potentially expensive) computation only once (in the :let binding), which is what you really were asking for. This is O(n^2), but if it meets the criteria above, it is a solution which can be understood easily to the reader; thus, it is maintainable. If it's critical that it be as efficient as possible, consider more algorithmically efficient solutions.
Edited by Yu Shen:
There is a slight problem when there no eligible sum. It may be improved as follows:
(let [upper-limit 10
keyboards [3 1]
drives [5 2 8]
eligbles (for [k keyboards
d drives
:let [sum (+ k d)]
:when (<= sum upper-limit)]
sum)]
(if (empty? eligbles)
nil
(apply max eligbles)))
If you want to keep the structure of your current code, you can use Mark Engelberg's better-cond library:
(require '[better-cond.core :as b])
(def s 10)
(def keyboards '(3 1))
(def drives '(5 2 8))
(let [k (sort (fn [x y] (> x y)) keyboards) ; sort into decreasing
d (sort (fn [x y] (> x y)) drives)] ; sort into decreasing
(loop [k1 (first k) ks (rest k) d1 (first d) ds (rest d)]
(b/cond
(or (nil? k1) (nil? d1)) -1 ; when one of the list is empty
:let [x (+ k1 d1)]
(< x s) x
(and (empty? ks) (empty? ds)) -1
:let [y (+ k1 (first ds))]
(empty? ks) (if (< y s) (dec y))
:let [z (+ d1 (first ks))]
(empty? ds) (if (< z s) (dec z))
:else (let [bs (take-while #(< % s) [(+ k1 (first ds)) (+ (first ks) d1)])]
(if (empty? bs) (recur (first ks) (rest ks) (first ds) (rest ds))
(apply max bs))))))

Append to a vector in a function

I have two columns (vectors) of different length and want to create a new vector of rows (if the column has enough elements). I'm trying to create a new vector (see failed attempt below). In Java this would involve the steps: iterate vector, check condition, append to vector, return vector. Do I need recursion here? I'm sure this is not difficult to solve, but it's very different than procedural code.
(defn rowmaker [colA colB]
"create a row of two columns of possibly different length"
(let [mia (map-indexed vector colA)
rows []]
(doseq [[i elA] mia]
;append if col has enough elements
(if (< i (count colA)) (vec (concat rows elA))) ; ! can't append to rows
(if (< i (count colB)) (vec (concat rows (nth colB i)))
;return rows
rows)))
Expected example input/output
(rowMaker ["A1"] ["B1" "B2"])
; => [["A1" "B1“] [“" "B2"]]
(defn rowMaker [colA colB]
"create a row from two columns"
(let [ca (count colA) cb (count colB)
c (max ca cb)
colA (concat colA (repeat (- c ca) ""))
colB (concat colB (repeat (- c cb) ""))]
(map vector colA colB)))
(defn rowmaker
[cols]
(->> cols
(map #(concat % (repeat "")))
(apply map vector)
(take (->> cols
(map count)
(apply max)))))
I prefer recursion to counting the number of items in collections. Here is my solution.
(defn row-maker
[col-a col-b]
(loop [acc []
as (seq col-a)
bs (seq col-b)]
(if (or as bs)
(recur (conj acc [(or (first as) "") (or (first bs) "")])
(next as)
(next bs))
acc)))
The following does the trick with the given example:
(defn rowMaker [v1 v2]
(mapv vector (concat v1 (repeat "")) v2))
(rowMaker ["A1"] ["B1" "B2"])
;[["A1" "B1"] ["" "B2"]]
However, it doesn't work the other way round:
(rowMaker ["B1" "B2"] ["A1"])
;[["B1" "A1"]]
To make it work both ways, we are going to have to write a version of mapv that fills in for sterile sequences so long as any sequence is fertile. Here is a corresponding lazy version for map, which will work for infinite sequences too:
(defn map-filler [filler f & colls]
(let [filler (vec filler)
colls (vec colls)
live-coll-map (->> colls
(map-indexed vector)
(filter (comp seq second))
(into {}))
split (fn [lcm] (reduce
(fn [[x xm] [i coll]]
(let [[c & cs] coll]
[(assoc x i c) (if cs (assoc xm i cs) xm)]))
[filler {}]
lcm))]
((fn expostulate [lcm]
(lazy-seq
(when (seq lcm)
(let [[this thoses] (split lcm)]
(cons (apply f this) (expostulate thoses))))))
live-coll-map)))
The idea is that you supply a filler sequence with one entry for each of the collections that follow. So we can now define your required rowmaker function thus:
(defn rowmaker [& colls]
(apply map-filler (repeat (count colls) "") vector colls))
This will take any number of collections, and will fill in blank strings for exhausted collections.
(rowmaker ["A1"] ["B1" "B2"])
;(["A1" "B1"] ["" "B2"])
(rowmaker ["B1" "B2"] ["A1"])
;(["B1" "A1"] ["B2" ""])
It works!
(defn make-row
[cola colb r]
(let [pad ""]
(cond
(and (not (empty? cola))
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [(first cola) (first colb)]))
(and (not (empty? cola))
(empty? colb)) (recur (rest cola)
(rest colb)
(conj r [(first cola) pad]))
(and (empty? cola)
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [pad (first colb)]))
:else r)))

How to properly indent clojure/lisp?

I want to indent the following piece of code.
How would a lisper indent this?
I am especially confused about where to put newlines.
(defn primes [n]
(letfn [(sieve [table removal]
(assoc table removal false))
(primebools [i table]
(cond
(= i n) table
(table i) (recur (inc i)
(reduce sieve
table
(range (* i i) n i)))
:else (recur (inc i)
table)))]
(let [prime? (primebools 2 (apply vector (repeat n true)))]
(filter prime? (range 2 n)))))
(defn primes [n]
(letfn [(sieve [table removal]
(assoc table removal false))
(primebools [i table]
(cond
(= i n) table
(table i) (recur (inc i)
(reduce sieve table
(range (* i i) n i)))
:else (recur (inc i) table)))]
(let [prime? (primebools 2 (apply vector (repeat n true)))]
(filter prime? (range 2 n)))))
Is how I would do it.
In addition to #dnolen's answer, I usually put a new line when there's
a new function (like your first two lines)
to indent long or important argument to a function (like the cond block)
logically keep each line to less than 80 characters and break up long ideas to smaller chunks
most importantly, be consistent!
Then just align and indent lines so that the identations are for the same depth of code.

What is causing this NullPointerException?

I'm using Project Euler questions to help me learn clojure, and I've run into an exception I can't figure out. nillify and change-all are defined at the bottom for reference.
(loop [the-vector (vec (range 100))
queue (list 2 3 5 7)]
(if queue
(recur (nillify the-vector (first queue)) (next queue))
the-vector))
This throws a NullPointerException, and I can't figure out why. The only part of the code I can see that could throw such an exception is the call to nillify, but it doesn't seem like queue ever gets down to just one element before the exception is thrown---and even if queue were to become empty, that's what the if statement is for.
Any ideas?
"given a vector, a value, and a list of indices, return a vector w/everthing # indice=value"
(defn change-all [the-vector indices val]
(apply assoc the-vector (interleave indices (repeat (count indices) val))))
"given a vector and a val, return a vector in which all entries with indices equal to multiples of val are nilled, but leave the original untouched"
(defn nillify [coll val]
(change-all coll (range (* 2 val) (inc (last coll)) val) nil))
The problem sexpr is
(inc (last coll))
You're changing the contents of the vector, you can't use this to determine the length anymore. Instead:
(count coll)
As a matter of style, use let bindings:
(defn change-all [the-vector indices val]
(let [c (count indices)
s (interleave indices (repeat c val))]
(apply assoc the-vector s)))
(defn nillify [coll val]
(let [c (count coll)
r (range (* 2 val) c val)]
(change-all coll r nil)))
(loop [the-vector (vec (range 100))
[f & r] '(2 3 5 7)]
(if r
(recur (nillify the-vector f) r)
the-vector))