clojure's commute example from the docs produces duplicates

clojure's commute example from the docs produces duplicates - concurrency

this setup is straight outta the docs here:
https://clojuredocs.org/clojure.core/commute
I'll just copy the code as is, with my comments:
(def counter (ref 0))
(defn alter-inc! [counter]
(dosync (Thread/sleep 100) (alter counter inc)))
(defn commute-inc! [counter]
(dosync (Thread/sleep 100) (commute counter inc)))
(defn bombard-counter! [n f counter]
(apply pcalls (repeat n #(f counter))))
(dosync (ref-set counter 0))
Running with the alter produces the randomly ordered list and takes 2000 ms, like in the example:
> (time (doall (bombard-counter! 20 alter-inc! counter)))
"Elapsed time: 2078.859995 msecs"
(7 6 1 5 4 2 3 9 12 10 8 14 11 13 15 18 16 17 20 19)
But running with commute does something very different from the claim in the official doc - I get duplicates:
> (time (doall (bombard-counter! 20 commute-inc! counter)))
"Elapsed time: 309.615195 msecs"
(5 1 1 6 5 4 1 8 8 10 10 12 14 13 15 16 17 18 19 20)
And that's definitely not the result promised in the docs! The difference in the running time is as advertised, but what with the duplicates? I'm prone to typos, so I've re-done it from scratch - same problem.

“commute returns the new value of the ref. However, the last in-transaction value you see from a commute will not always match the end-of-transaction value of a ref, because of reordering. If another transaction sneaks in and alters a ref that you are trying to commute, the STM will not restart your transaction. Instead, it will simply run your commute function again, out of order. Your transaction will never even see the ref value that your commute function finally ran against."
Since Clojure’s STM can reorder commutes behind your back, you can use
them only when you do not care about ordering.”
Excerpt From: Stuart Halloway. “Programming Clojure.”
This is the reason you see out of order update results in your output.

Related

Errata list for book 'Quick Clojure'?

Is there an errata list for the book 'Quick Clojure' by Mark McDonnell ?
I went to the publisher website and could not find one there : https://www.apress.com/gp/book/9781484229514
specifically i think there is an error in the following on page 50:
(defn add-n [n, coll]
(lazy-seq (cons
(+ n (first coll))
(add-n n (rest coll)))))
(type (add-n (range)))
;; clojure.lang.LazySeq
(take 10 (add-n (range))) ;; <--- Error here: `add-n` requires 2 arguments ?
;; (5 6 7 8 9 10 11 12 13 14)

Let's see if we can't figure out what was meant. We know from the comment that
(take 10 (add-n (range)))
is meant to return (5 6 7 8 9 10 11 12 13 14). It also appears that the missing argument is the n, which at first guess should be a number, and so the invocation of add-n should look something like
(add-n _ (range)))
So what value could we use to replace the _ to make it return the expected value? The obvious answer is 5. And so we test it by evaluating
(take 10 (add-n 5 (range)))
which returns
(5 6 7 8 9 10 11 12 13 14)
So there you have it. Now you can go to the Apress errata page and submit this as a correction. (I can't because I don't own the book, don't know what page it's on, etc).

Clojure - Defining Number of Thread Used in Pmap

I'm trying to use concurrency for my maps using pmap in Clojure, and I need to do some analysis based on the efficiency of the program under different thread counts.
Is the number of threads defined in Clojure within the pmap function, or somewhere in the project file? Looking at the pmap documentation there are no additional parameters compared to the map function.
For example, I need to run the program under 2, 32, 64 etc... threads.

Your question seems be to closely relative enough to:
How many threads does Clojure's pmap function spawn for URL-fetching operations?
From the answer of Alex Miller, you can deduce that the number of threads used by pmap is <your number of core> + 2. I don't why there is a + 2 but even with the current release of Clojure, 1.10.0, the source code of the pmap function is still the same.
As I have 4 cores on my machine, pmap should use 6 threads.
-- EDIT
To really answer to your question, you can define a custom pmap function, custom-pmap, which allow you to specify the number of thread you would like to use:
(defn custom-pmap
([f coll nb-thread]
(let [n nb-thread
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets)))))
(custom-pmap inc (range 1000) 8)
;; => (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ....999 1000)

You can use claypoole's pmap that takes a certain sized threadpool as a first argument.
# project.clj
[com.climate/claypoole "1.1.4"]
# or deps.edn
com.climate/claypoole {:mvn/version "1.1.4"}
Now let's specify some pool sizes and map an operation that takes one second over a collection of size 64.
(ns demo
(:refer-clojure :exclude-clojure [pmap])
(:require [com.climate.claypoole :refer [threadpool pmap]]))
(def pool-sizes
[2 32 64])
(doseq [pool-size pool-sizes]
(time (doall (pmap (threadpool pool-size) (fn [n] (Thread/sleep 1000)) (range 64)))))
"Elapsed time: 32113.704013 msecs""Elapsed time: 2013.242638 msecs""Elapsed time: 1011.616369 msecs"
So some overhead and 32 seconds for a threadpool with size 2, 2 seconds for size 32 en 1 second for size 64.

How to atomically check if a key exists in a map and add it if it doesn't exist

I am trying to generate a new key that doesn't exist in my map (atom), then immediately add it to my map and return the key. However, the check for the key and the update are not done atomically. I am wondering how to do this atomically so that it is safe for concurrency.
Ideally this key is short enough to type but hard to guess (so a user can create a session, and his/her friends can join with the key). So 0,1,2,3... is not ideal since a user can try enter sessions n-1. Something like UUID where I don't have to worry about collisions is also not ideal. I was planning on generating a short random string (e.g. "udibwi") but I've used rand-int 25 in the code snippet below to simplify the problem.
I've written a function which randomly generates a key. It checks if the map contains it. If it already does then try a new key. If it doesn't, associate it to my map and then return the key.
This works but I don't think it is safe for multiple threads. Is there a way to do this using atoms or is there a better way?
(defonce sessions (atom {}))
(defn getNewSessionId []
(let [id (rand-int 25)]
(if (contains? #sessions id)
(createNewId)
(do
(swap! sessions assoc id "")
id))))

You're trying to do too much at once. Having that one function generate an ID and update the atom is complicating things. I'd break this down into three functions:
A function that generates an ID based on an existing map
A function that updates a plain, immutable map using the above function
A function that updates an atom (although this will be so simple after implementing the previous two functions that it may not be necessary at all).
Something like:
; Notice how this doesn't deal with atoms at all
(defn generate-new-id [old-map]
(let [new-id (rand-int 25)]
(if (old-map new-id) ; or use "contains?"
(recur old-map) ; Using "recur" so we don't get a StackOverflow
new-id)))
; Also doesn't know anything about the atom
(defn assoc-new-id [old-map]
(let [new-id (generate-new-id old-map)]
(assoc old-map new-id "")))
(defonce data (atom {}))
(defn swap-new-id! []
(swap! data assoc-new-id))
The main changes:
Everything that could be removed from the atom swapping logic was moved to its own function. This allows you to just pass the function handling all the logic to swap! and it will be handled atomically.
Clojure uses dash-case, not camelCase.
I used recur instead of actual recursion so you won't get a StackOverflow while the ID is being brute-forced.
Of course though, this suffers from problems if the available number of IDs left is small. It may take a long time for it to "find" an available ID via brute-force. You might be better off using a "generator" backed by an atom to produce IDs atomically starting from 0:
(defn new-id-producer []
(atom -1))
(defn generate-id [producer]
(swap! producer inc)) ; "swap!" returns the new value that was swapped in
(let [producer (new-id-producer)]
; Could be run on multiple threads at once
(doseq [id (repeatedly 5 #(generate-id producer))]
(println id)))
0
1
2
3
4
=> nil
I tried to write an example of this operating on multiple threads at once:
(let [producer (new-id-producer)
; Emulate the "consumption" of IDs
consume (fn []
(doseq [id (repeatedly 20 #(generate-id producer))]
(println (.getId (Thread/currentThread)) id)))]
(doto (Thread. consume)
(.start))
(doto (Thread. consume)
(.start)))
37 0
3738 1
38 3
38 4
38 5
38 6
38 7
38 8
38 9
38 10
38 11
38 12
38 13
38 14
38 15
38 16
38 17
38 18
38 19
38 20
38 21
2
37 22
37 23
37 24
37 25
37 26
37 27
37 28
37 29
37 30
37 31
37 32
37 33
37 34
37 35
37 36
37 37
37 38
37 39
But the un-synchronized nature of the printing to the outstream made this output a mess. If you squint a bit though, you can see that the threads (with Thread IDs of 37 and 38) are taking turns.
If you need the new ID returned, the only clean way I know of that doesn't involve locking is to use a second atom to get the returned ID out of the swapping function. This requires getting rid of assoc-new-id:
(defn generate-new-id [old-map]
(let [new-id (rand-int 25)]
(if (old-map new-id)
(recur old-map)
new-id)))
(defn swap-new-id! [old-map]
(let [result-atom (atom nil)]
(swap! data (fn [m]
(let [id (generate-new-id m)]
(reset! result-promise id) ; Put the ID in the result atom
(assoc m id ""))))
#result-promise)) ; Then retrieve it here
Or, if a very inefficient solution is fine and you're using Clojure 1.9.0, you can just search the maps to find what key was added using clojure.set.difference:
(defn find-new-id [old-map new-map]
(clojure.set/difference (set (keys new-map))
(set (keys old-map))))
(defn swap-new-id! []
(let [[old-map new-map] (swap-vals! data assoc-new-id)] ; New in 1.9.0
(find-new-id new-map old-map)))
But again, this is very inefficient. It requires two iterations of each map.

Can you please update your question with the reason you are trying to do this? There are almost certainly better solutions than the one you propose.
If you really want to generate unique keys for a map, there are 2 easy answers.
(1) For coordinated keys, you could use an atom to hold an integer of the last key generated.
(def last-map-key (atom 0))
(defn new-map-key (swap! last-map-key inc))
which is guaranteed to generate unique new map keys.
(2) For uncoordinated keys, use a UUID as with clj-uuid/v1
(3) If you really insist on your original algorithm, you could use a Clojure ref, but that is an abuse of it's intended purpose.

You can store the information about which id was the last one in the atom as well.
(defonce data
(atom {:sessions {}
:latest-id nil}))
(defn generate-session-id [sessions]
(let [id (rand-int 25)]
(if (contains? sessions id)
(recur sessions)
id)))
(defn add-new-session [{:keys [sessions] :as data}]
(let [id (generate-session-id sessions)]
(-> data
(assoc-in [:sessions id] {})
(assoc :latest-id id))))
(defn create-new-session! []
(:latest-id (swap! data add-new-session)))
As Carcigenicate shows, by using swap-vals! it is derivable from the before and after states, but it's simpler to just keep around.

Can I process an unrealized lazy-seq step by step

I have a lazy-seq where each item takes some time to calculate:
(defn gen-lazy-seq [size]
(for [i (range size)]
(do
(Thread/sleep 1000)
(rand-int 10))))
Is it possible to evaluate this sequence step by step and print the results. When I try to process it with for or doseq clojure always realizes the whole lazy-seq before printing anything out:
(doseq [item (gen-lazy-seq 10)]
(println item))
(for [item (gen-lazy-seq 10)]
(println item))
Both expressions will wait for 10 seconds before printing anything out. I have looked at doall and dorun as a solution, but they require that the lazy-seq producing function contain the println. I would like to define a lazy-seq producing function and lazy-seq printing function separately and make them work together item by item.
Motivation for trying to do this:
I have messages coming in over a network, and I want to start processing them before all have been received. At the same time it would be nice to save all messages corresponding to a query in a lazy-seq.
Edit 1:
JohnJ's answer shows how to create a lazy-seq that will be evaluated step by step. I would like to know how to evaluate any lazy-seq step by step.
I'm confused because running (chunked-seq? (gen-lazy-seq 10)) on gen-lazy-seq as defined above OR as defined in JohnJ's answer both return false. So then the problem can't be that one creates a chunked sequence and the other doesn't.
In this answer, a function seq1 which turns a chunked lazy-seq into a non-chunked one is shown. Trying that function still gives the same problem with delayed output. I thought that maybe the delay has to do with the some sort of buffering in the repl, so I tried to also print the time when each item in the seq is realized:
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
(let [start-time (java.lang.System/currentTimeMillis)]
(doseq [item (seq1 (gen-lazy-seq 10))]
(let [elapsed-time (- (java.lang.System/currentTimeMillis) start-time)]
(println "time: " elapsed-time "item: " item))))
; output:
time: 10002 item: 1
time: 10002 item: 8
time: 10003 item: 9
time: 10003 item: 1
time: 10003 item: 7
time: 10003 item: 2
time: 10004 item: 0
time: 10004 item: 3
time: 10004 item: 5
time: 10004 item: 0
Doing the same thing with JohnJ's version of gen-lazy-seq works as expected
; output:
time: 1002 item: 4
time: 2002 item: 1
time: 3002 item: 6
time: 4002 item: 8
time: 5002 item: 8
time: 6002 item: 4
time: 7002 item: 5
time: 8002 item: 6
time: 9003 item: 1
time: 10003 item: 4
Edit 2:
It's not only sequences generated with for which have this problem. This sequence generated with map cannot be processed step by step regardless of seq1 wrapping:
(defn gen-lazy-seq [size]
(map (fn [_]
(Thread/sleep 1000)
(rand-int 10))
(range 0 size)))
But this sequence, also created with map works:
(defn gen-lazy-seq [size]
(map (fn [_]
(Thread/sleep 1000)
(rand-int 10))
(repeat size :ignored)))

Clojure's lazy sequences are often chunked. You can see the chunking at work in your example if you take large sizes (it will be helpful to reduce the thread sleep time in this case). See also these related SO posts.
Though for seems to be chunked, the following is not and works as desired:
(defn gen-lazy-seq [size]
(take size (repeatedly #(do (Thread/sleep 1000)
(rand-int 10)))))
(doseq [item (gen-lazy-seq 10)]
(println item))
"I have messages coming in over a network, and I want to start processing them before all have been received." Chunked or no, this should actually be the case if you process them lazily.

sequence of a rolling average in Clojure

I'm looking for an elegant way to generate a sequence of the rolling average of a sequence of numbers. Hopefully something more elegant than using lazy-seq

Without any consideration of efficiency:
(defn average [lst] (/ (reduce + lst) (count lst)))
(defn moving-average [window lst] (map average (partition window 1 lst)))
user> (moving-average 5 '(1 2 3 4 5 6 7 8))
(3 4 5 6)
If you need it to be fast, there are some fairly obvious improvements to be made!
But it will get less elegant.

There's a very similar question on SO: Calculating the Moving Average of a List. It's more general -- a number of FP-friendly languages are represented, with the accepted answer using Scala -- but there are a few nice Clojure solutions.
I've posted my own solution over there. Note that it does use lazy-seq, but that's because I wanted it to perform well for large periods (which means adjusting the average at each step rather than calculating a separate average for each window of size = period into the input list). Look around that Q for nice solutions which made the other tradeoff, resulting in shorter code with a somewhat more declarative feel, which actually performs better for very short periods (although suffers significant slowdowns for longer periods, as is to be expected).

This version is a bit faster, especially for long windows, since it keeps a rolling sum and avoids repeatedly adding the same things.
Because of the lazy-seq, it's also perfectly general and won't blow stack
(defn partialsums [start lst]
(lazy-seq
(if-let [lst (seq lst)]
(cons start (partialsums (+ start (first lst)) (rest lst)))
(list start))))
(defn sliding-window-moving-average [window lst]
(map #(/ % window)
(let [start (apply + (take window lst))
diffseq (map - (drop window lst) lst)]
(partialsums start diffseq))))
;; To help see what it's doing:
(sliding-window-moving-average 5 '(1 2 3 4 5 6 7 8 9 10 11))
start = (+ 1 2 3 4 5) = 15
diffseq = - (6 7 8 9 10 11)
(1 2 3 4 5 6 7 8 9 10 11)
= (5 5 5 5 5 5)
(partialsums 15 '(5 5 5 5 5 5) ) = (15 20 25 30 35 40 45)
(map #(/ % 5) (20 25 30 35 40 45)) = (3 4 5 6 7 8 9)
;; Example
(take 20 (sliding-window-moving-average 5 (iterate inc 0)))

Instead of the partialsums fn (which is helpful to see what's going on), you can use reductions in clojure.core:
(defn sliding-window-moving-average [window lst]
(map #(/ % window)
(let [start (apply + (take window lst))
diffseq (map - (drop window lst) lst)]
(reductions + start diffseq))))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

clojure's commute example from the docs produces duplicates - concurrency

Related

Errata list for book 'Quick Clojure'?

Clojure - Defining Number of Thread Used in Pmap

How to atomically check if a key exists in a map and add it if it doesn't exist

Can I process an unrealized lazy-seq step by step

sequence of a rolling average in Clojure

Categories

Resources