Efficient side-effect-only analogue of Clojure's map function

Efficient side-effect-only analogue of Clojure's map function - clojure

What if map and doseq had a baby? I'm trying to write a function or macro like Common Lisp's mapc, but in Clojure. This does essentially what map does, but only for side-effects, so it doesn't need to generate a sequence of results, and wouldn't be lazy. I know that one can iterate over a single sequence using doseq, but map can iterate over multiple sequences, applying a function to each element in turn of all of the sequences. I also know that one can wrap map in dorun. (Note: This question has been extensively edited after many comments and a very thorough answer. The original question focused on macros, but those macro issues turned out to be peripheral.)
This is fast (according to criterium):
(defn domap2
[f coll]
(dotimes [i (count coll)]
(f (nth coll i))))
but it only accepts one collection. This accepts arbitrary collections:
(defn domap3
[f & colls]
(dotimes [i (apply min (map count colls))]
(apply f (map #(nth % i) colls))))
but it's very slow by comparison. I could also write a version like the first, but with different parameter cases [f c1 c2], [f c1 c2 c3], etc., but in the end, I'll need a case that handles arbitrary numbers of collections, like the last example, which is simpler anyway. I've tried many other solutions as well.
Since the second example is very much like the first except for the use of apply and the map inside the loop, I suspect that getting rid of them would speed things up a lot. I have tried to do this by writing domap2 as a macro, but the way that the catch-all variable after & is handled keeps tripping me up, as illustrated above.
Other examples (out of 15 or 20 different versions), benchmark code, and times on a Macbook Pro that's a few years old (full source here):
(defn domap1
[f coll]
(doseq [e coll]
(f e)))
(defn domap7
[f coll]
(dorun (map f coll)))
(defn domap18
[f & colls]
(dorun (apply map f colls)))
(defn domap15
[f coll]
(when (seq coll)
(f (first coll))
(recur f (rest coll))))
(defn domap17
[f & colls]
(let [argvecs (apply (partial map vector) colls)] ; seq of ntuples of interleaved vals
(doseq [args argvecs]
(apply f args))))
I'm working on an application that uses core.matrix matrices and vectors, but feel free to substitute your own side-effecting functions below.
(ns tst
(:use criterium.core
[clojure.core.matrix :as mx]))
(def howmany 1000)
(def a-coll (vec (range howmany)))
(def maskvec (zero-vector :vectorz howmany))
(defn unmaskit!
[idx]
(mx/mset! maskvec idx 1.0)) ; sets element idx of maskvec to 1.0
(defn runbench
[domapfn label]
(print (str "\n" label ":\n"))
(bench (def _ (domapfn unmaskit! a-coll))))
Mean execution times according to Criterium, in microseconds:
domap1: 12.317551 [doseq]
domap2: 19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7: 53.263230 [map with dorun]
domap18: 54.456801 [map with dorun, multiple collections]
domap15: 32.034993 [recur]
domap17: 95.259984 [doseq, multiple collections interleaved using map]
EDIT: It may be that dorun+map is the best way to implement domap for multiple large lazy sequence arguments, but doseq is still king when it comes to single lazy sequences. Performing the same operation as unmask! above, but running the index through (mod idx 1000), and iterating over (range 100000000), doseq is about twice as fast as dorun+map in my tests (i.e. (def domap25 (comp dorun map))).

You don't need a macro, and I don't see why a macro would be helpful here.
user> (defn do-map [f & lists] (apply mapv f lists) nil)
#'user/do-map
user> (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
note do-map here is eager (thanks to mapv) and only executes for side effects
Macros can use varargs lists, as the (useless!) macro version of do-map demonstrates:
user> (defmacro do-map-macro [f & lists] `(do (mapv ~f ~#lists) nil))
#'user/do-map-macro
user> (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
user> (macroexpand-1 '(do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40)))
(do (clojure.core/mapv (comp println +) (range 2 6) (range 8 11) (range 22 40)) nil)
Addendum:
addressing the efficiency / garbage-creation concerns:
note that below I truncate the output of the criterium bench function, for conciseness reasons:
(defn do-map-loop
[f & lists]
(loop [heads lists]
(when (every? seq heads)
(apply f (map first heads))
(recur (map rest heads)))))
user> (crit/bench (with-out-str (do-map-loop (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 11.367804 µs
...
This looks promising because it doesn't create a data structure that we aren't using anyway (unlike mapv above). But it turns out it is slower than the previous (maybe because of the two map calls?).
user> (crit/bench (with-out-str (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 7.427182 µs
...
user> (crit/bench (with-out-str (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 8.355587 µs
...
Since the loop still wasn't faster, let's try a version which specializes on arity, so that we don't need to call map twice on every iteration:
(defn do-map-loop-3
[f a b c]
(loop [[a & as] a
[b & bs] b
[c & cs] c]
(when (and a b c)
(f a b c)
(recur as bs cs))))
Remarkably, though this is faster, it is still slower than the version that just used mapv:
user> (crit/bench (with-out-str (do-map-loop-3 (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 9.450108 µs
...
Next I wondered if the size of the input was a factor. With larger inputs...
user> (def test-input (repeatedly 3 #(range (rand-int 100) (rand-int 1000))))
#'user/test-input
user> (map count test-input)
(475 531 511)
user> (crit/bench (with-out-str (apply do-map-loop-3 (comp println +) test-input)))
...
Execution time mean : 1.005073 ms
...
user> (crit/bench (with-out-str (apply do-map (comp println +) test-input)))
...
Execution time mean : 756.955238 µs
...
Finally, for completeness, the timing of do-map-loop (which as expected is slightly slower than do-map-loop-3)
user> (crit/bench (with-out-str (apply do-map-loop (comp println +) test-input)))
...
Execution time mean : 1.553932 ms
As we see, even with larger input sizes, mapv is faster.
(I should note for completeness here that map is slightly faster than mapv, but not by a large degree).

Related

Return an else value when using recur

I am new to Clojure, and doing my best to forget all my previous experience with more procedural languages (java, ruby, swift) and embrace Clojure for what it is. I am actually really enjoying the way it makes me think differently -- however, I have come up against a pattern that I just can't seem to figure out. The easiest way to illustrate, is with some code:
(defn char-to-int [c] (Integer/valueOf (str c)))
(defn digits-dont-decrease? [str]
(let [digits (map char-to-int (seq str)) i 0]
(when (< i 5)
(if (> (nth digits i) (nth digits (+ i 1)))
false
(recur (inc i))))))
(def result (digits-dont-decrease? "112233"))
(if (= true result)
(println "fit rules")
(println "doesn't fit rules"))
The input is a 6 digit number as a string, and I am simply attempting to make sure that each digit from left to right is >= the previous digit. I want to return false if it doesn't, and true if it does. The false situation works great -- however, given that recur needs to be the last thing in the function (as far as I can tell), how do I return true. As it is, when the condition is satisfied, I get an illegal argument exception:
Execution error (IllegalArgumentException) at clojure.exercise.two/digits-dont-decrease? (four:20).
Don't know how to create ISeq from: java.lang.Long
How should I be thinking about this? I assume my past training is getting in my mental way.

This is not answering your question, but also shows an alternative. While the (apply < ...) approach over the whole string is very elegant for small strings (it is eager), you can use every? for an short-circuiting approach. E.g.:
user=> (defn nr-seq [s] (map #(Integer/parseInt (str %)) s))
#'user/nr-seq
user=> (every? (partial apply <=) (partition 2 1 (nr-seq "123")))
true

You need nothing but
(apply <= "112233")
Reason: string is a sequence of character and comparison operator works on character.
(->> "0123456789" (mapcat #(repeat 1000 %)) (apply str) (def loooong))
(count loooong)
10000
(time (apply <= loooong))
"Elapsed time: 21.006625 msecs"
true
(->> "9123456789" (mapcat #(repeat 1000 %)) (apply str) (def bad-loooong))
(count bad-loooong)
10000
(time (apply <= bad-loooong))
"Elapsed time: 2.581750 msecs"
false
(above runs on my iPhone)

In this case, you don't really need loop/recur. Just use the built-in nature of <= like so:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(def true-samples
["123"
"112233"
"13"])
(def false-samples
["10"
"12324"])
(defn char->int
[char-or-str]
(let [str-val (str char-or-str)] ; coerce any chars to len-1 strings
(assert (= 1 (count str-val)))
(Integer/parseInt str-val)))
(dotest
(is= 5 (char->int "5"))
(is= 5 (char->int \5))
(is= [1 2 3] (mapv char->int "123"))
; this shows what we are going for
(is (<= 1 1 2 2 3 3))
(isnt (<= 1 1 2 1 3 3))
and now test the char sequences:
;-----------------------------------------------------------------------------
; using built-in `<=` function
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (apply <= digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (apply <= digit-vals))))
if you want to write your own, you can like so:
(defn increasing-equal-seq?
"Returns true iff sequence is non-decreasing"
[coll]
(when (< (count coll) 2)
(throw (ex-info "coll must have at least 2 vals" {:coll coll})))
(loop [prev (first coll)
remaining (rest coll)]
(if (empty? remaining)
true
(let [curr (first remaining)
prev-next curr
remaining-next (rest remaining)]
(if (<= prev curr)
(recur prev-next remaining-next)
false)))))
;-----------------------------------------------------------------------------
; using home-grown loop/recur
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (increasing-equal-seq? digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (increasing-equal-seq? digit-vals))))
)
with result
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
Ran 2 tests containing 15 assertions.
0 failures, 0 errors.
Passed all tests
Finished at 23:36:17.096 (run time: 0.028s)

You an use loop with recur.
Assuming you require following input v/s output -
"543221" => false
"54321" => false
"12345" => true
"123345" => true
Following function can help
;; Assuming char-to-int is defined by you before as per the question
(defn digits-dont-decrease?
[strng]
(let [digits (map char-to-int (seq strng))]
(loop [;;the bindings in loop act as initial state
decreases true
i (- (count digits) 2)]
(let [decreases (and decreases (>= (nth digits (+ i 1)) (nth digits i)))]
(if (or (< i 1) (not decreases))
decreases
(recur decreases (dec i)))))))
This should work for numeric string of any length.
Hope this helps. Please let me know if you were looking for something else :).

(defn non-decreasing? [str]
(every?
identity
(map
(fn [a b]
(<= (int a) (int b)))
(seq str)
(rest str))))
(defn non-decreasing-loop? [str]
(loop [a (seq str) b (rest str)]
(if-not (seq b)
true
(if (<= (int (first a)) (int (first b)))
(recur (rest a) (rest b))
false))))
(non-decreasing? "112334589")
(non-decreasing? "112324589")
(non-decreasing-loop? "112334589")
(non-decreasing-loop? "112324589")

How to replace the last element in a vector in Clojure

As a newbie to Clojure I often have difficulties to express the simplest things. For example, for replacing the last element in a vector, which would be
v[-1]=new_value
in python, I end up with the following variants in Clojure:
(assoc v (dec (count v)) new_value)
which is pretty long and inexpressive to say the least, or
(conj (vec (butlast v)) new_value)
which even worse, as it has O(n) running time.
That leaves me feeling silly, like a caveman trying to repair a Swiss watch with a club.
What is the right Clojure way to replace the last element in a vector?
To support my O(n)-claim for butlast-version (Clojure 1.8):
(def v (vec (range 1e6)))
#'user/v
user=> (time (first (conj (vec (butlast v)) 55)))
"Elapsed time: 232.686159 msecs"
0
(def v (vec (range 1e7)))
#'user/v
user=> (time (first (conj (vec (butlast v)) 55)))
"Elapsed time: 2423.828127 msecs"
0
So basically for 10 time the number of elements it is 10 times slower.

I'd use
(defn set-top [coll x]
(conj (pop coll) x))
For example,
(set-top [1 2 3] :a)
=> [1 2 :a]
But it also works on the front of lists:
(set-top '(1 2 3) :a)
=> (:a 2 3)
The Clojure stack functions - peek, pop, and conj - work on the natural open end of a sequential collection.
But there is no one right way.
How do the various solutions react to an empty vector?
Your Python v[-1]=new_value throws an exception, as does your (assoc v (dec (count v)) new_value) and my (defn set-top [coll x] (conj (pop coll) x)).
Your (conj (vec (butlast v)) new_value) returns [new_value]. The butlast has no effect.

If you insist on being "pure", your 2nd or 3rd solutions will work. I prefer to be simpler & more explicit using the helper functions from the Tupelo library:
(s/defn replace-at :- ts/List
"Replaces an element in a collection at the specified index."
[coll :- ts/List
index :- s/Int
elem :- s/Any]
...)
(is (= [9 1 2] (replace-at (range 3) 0 9)))
(is (= [0 9 2] (replace-at (range 3) 1 9)))
(is (= [0 1 9] (replace-at (range 3) 2 9)))
As with drop-at, replace-at will throw an exception for invalid values of index.
Similar helper functions exist for
insert-at
drop-at
prepend
append
Note that all of the above work equally well for either a Clojure list (eager or lazy) or a Clojure vector. The conj solution will fail unless you are careful to always coerce the input to a vector first as in your example.

Speeding up Clojure to avoid timeouts

I've doing a few of the hackerrank challenges and noticing that I seem to not be able to code efficient code, as quite often I get timeouts, even though the answers that do pass the tests are correct. For example for this challenge this is my code:
(let [divisors (fn [n] (into #{n} (into #{1} (filter (comp zero? (partial rem n)) (range 1 n)))))
str->ints (fn [string]
(map #(Integer/parseInt %)
(clojure.string/split string #" ")))
;lines (line-seq (java.io.BufferedReader. *in*))
lines ["3"
"10 4"
"1 100"
"288 240"
]
pairs (map str->ints (rest lines))
first-divs (map divisors (map first pairs))
second-divs (map divisors (map second pairs))
intersections (map clojure.set/intersection first-divs second-divs)
counts (map count intersections)
]
(doseq [v counts]
(println (str v))))
Note that clojure/set doesn't exist at hackerrank. I just put in here for the sake of brevity.

in this exact case there is an obvious misuse of map function:
although the clojure collections are lazy, operations on them still don't come for free. So when you chain lots of maps, you still have all the intermediate collections (there are 7 here). To avoid this, one would usually use transducers, but in your case you are just mapping every input line to one output line, so it is really enough to do it in one pass over the input collection:
(let [divisors (fn [n] (into #{n} (into #{1} (filter (comp zero? (partial rem n)) (range 1 n)))))
str->ints (fn [string]
(map #(Integer/parseInt %)
(clojure.string/split string #" ")))
;lines (line-seq (java.io.BufferedReader. *in*))
get-counts (fn [pair] (let [d1 (divisors (first pair))
d2 (divisors (second pair))]
(count (clojure.set/intersection d1 d2))))
lines ["3"
"10 4"
"1 100"
"288 240"
]
counts (map (comp get-counts str->ints) (rest lines))]
(doseq [v counts]
(println (str v))))
Not talking about the correctness of the whole algorithm here. Maybe it could also be optimized. But as of clojure's mechanics, this change should speed up your code quite notably.
update
as for the algorithm, you would probably want to start with limiting the range from 1..n to 1..(sqrt n), adding both x and n/x into resulting set when x is a divisor of n, that should give you quite a big profit for large numbers:
(defn divisors [n]
(into #{} (mapcat #(when (zero? (rem n %)) [% (/ n %)])
(range 1 (inc (Math/floor (Math/sqrt n)))))))
also i would consider finding all the divisors of the least of two numbers, and then keeping the ones the other number is divisible by. This will eliminate the search of the greater number's divisors.
(defn common-divisors [pair]
(let [[a b] (sort pair)
divs (divisors a)]
(filter #(zero? (rem b %)) divs)))
if that still doesn't manage to pass the test, you should probably look for some nice algorithm for common divisors.
update 2
submitted the updated algorithm to hackerrank and it passes well now

Is is possible to destructure a clojure vector into last two items and the rest?

I know I can destructure a vector "from the front" like this:
(fn [[a b & rest]] (+ a b))
Is there any (short) way to access the last two elements instead?
(fn [[rest & a b]] (+ a b)) ;;Not legal
My current alternative is to
(fn [my-vector] (let [[a b] (take-last 2 my-vector)] (+ a b)))
and it was trying to figure out if there is way to do that in a more convenient way directly in the function arguments.

You can peel off the last two elements and add them thus:
((fn [v] (let [[b a] (rseq v)] (+ a b))) [1 2 3 4])
; 7
rseq supplies a reverse sequence for a vector in quick time.
We just destructure its first two elements.
We needn't mention the rest of it, which we don't do anything with.

user=> (def v (vec (range 0 10000000)))
#'user/v
user=> (time ((fn [my-vector] (let [[a b] (take-last 2 my-vector)] (+ a b))) v))
"Elapsed time: 482.965121 msecs"
19999997
user=> (time ((fn [my-vector] (let [a (peek my-vector) b (peek (pop my-vector))] (+ a b))) v))
"Elapsed time: 0.175539 msecs"
19999997
My advice would be to throw convenience to the wind and use peek and pop to work with the end of a vector. When your input vector is very large, you'll see tremendous performance gains.
(Also, to answer the question in the title: no.)

Clojure: map function with updatable state

What is the best way of implementing map function together with an updatable state between applications of function to each element of sequence? To illustrate the issue let's suppose that we have a following problem:
I have a vector of the numbers. I want a new sequence where each element is multiplied by 2 and then added number of 10's in the sequence up to and including the current element. For example from:
[20 30 40 10 20 10 30]
I want to generate:
[40 60 80 21 41 22 62]
Without adding the count of 10 the solution can be formulated using a high level of abstraction:
(map #(* 2 %) [20 30 40 10 20 10 30])
Having count to update forced me to "go to basic" and the solution I came up with is:
(defn my-update-state [x state]
(if (= x 10) (+ state 1) state)
)
(defn my-combine-with-state [x state]
(+ x state))
(defn map-and-update-state [vec fun state update-state combine-with-state]
(when-not (empty? vec)
(let [[f & other] vec
g (fun f)
new-state (update-state f state)]
(cons (combine-with-state g new-state) (map-and-update-state other fun new-state update-state combine-with-state))
)))
(map-and-update-state [20 30 40 50 10 20 10 30 ] #(* 2 %) 0 my-update-state my-combine-with-state )
My question: is it the appropriate/canonical way to solve the problem or I overlooked some important concepts/functions.
PS:
The original problem is walking AST (abstract syntax tree) and generating new AST together with updating symbol table, so when proposing the solution to the problem above please keep it in mind.
I do not worry about blowing up stack, so replacement with loop+recur is not
my concern here.
Is using global Vars or Refs instead of passing state as an argument a definite no-no?

You can use reduce to accumulate a pair of the number of 10s seen so far and the current vector of results.:
(defn map-update [v]
(letfn [(update [[ntens v] i]
(let [ntens (+ ntens (if (= 10 i) 1 0))]
[ntens (conj v (+ ntens (* 2 i)))]))]
(second (reduce update [0 []] v))))

To count # of 10 you can do
(defn count-10[col]
(reductions + (map #(if (= % 10) 1 0) col)))
Example:
user=> (count-10 [1 2 10 20 30 10 1])
(0 0 1 1 1 2 2)
And then a simple map for the final result
(map + col col (count-10 col)))

Reduce and reductions are good ways to traverse a sequence keeping a state. If you feel your code is not clear you can always use recursion with loop/recur or lazy-seq like this
(defn twice-plus-ntens
([coll] (twice-plus-ntens coll 0))
([coll ntens]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-ntens (if (= 10 item) (inc ntens) ntens)]
(cons (+ (* 2 item) new-ntens)
(twice-plus-ntens (rest s) new-ntens)))))))
have a look at map source code evaluating this at your repl
(source map)
I've skipped chunked optimization and multiple collection support.
You can make it a higher-order function this way
(defn map-update
([mf uf coll] (map-update mf uf (uf) coll))
([mf uf val coll]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-status (uf item val)]
(cons (mf item new-status)
(map-update mf uf new-status (rest s))))))))
(defn update-f
([] 0)
([item status]
(if (= item 10) (inc status) status)))
(defn map-f [item status]
(+ (* 2 item) status))
(map-update map-f update-f in)

The most appropriate way is to use function with state
(into
[]
(map
(let [mem (atom 0)]
(fn [val]
(when (== val 10) (swap! mem inc))
(+ #mem (* val 2)))))
[20 30 40 10 20 10 30])
also see
memoize
standard function

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Efficient side-effect-only analogue of Clojure's map function - clojure

Related

Return an else value when using recur

How to replace the last element in a vector in Clojure

Speeding up Clojure to avoid timeouts

Is is possible to destructure a clojure vector into last two items and the rest?

Clojure: map function with updatable state

Categories

Resources