Are refs really consistent within a STM transaction?

Are refs really consistent within a STM transaction? - clojure

I read at clojure.org/refs that
All reads of Refs will see a consistent snapshot of the 'Ref world' as of the starting point of the transaction (its 'read point'). The transaction will see any changes it has made. This is called the in-transaction-value.
There's also a link to Snapshot Isolation on wikipedia that implies that reads of any number of refs will be consistent with each other once a transaction has started.
I made a test case...
(def r1 (ref 0))
(def r2 (ref 0))
(defn delay-then-inc-ref [id ref delay]
(.start
(Thread.
#((println id " start")
(Thread/sleep delay)
(dosync
(alter ref inc))
(println id " end")))))
(defn deref-delay-deref [ref1 ref2 delay]
(.start
(Thread.
#((println "S start")
(dosync
(let [a #ref2]
(Thread/sleep delay)
(println "S r1=" #ref1))) ; #ref1 consistent with #ref2 ?
(println "S end")))))
*clojure-version*
;=> {:major 1, :minor 3, :incremental 0, :qualifier nil}
(deref-delay-deref r1 r2 2000)
(delay-then-inc-ref "1" r1 500)
(delay-then-inc-ref "2" r1 1000)
(delay-then-inc-ref "3" r1 1500)
The output is:
S start
1 start
2 start
3 start
1 end
2 end
3 end
r1 = 3
S end
nil
The value of r1 = 3 rather than r1 = 0 suggests that in deref-delay-deref the deref of ref1 after the sleep is picking the value of r1 after the three delay-then-inc-ref transactions have occurred.
Note that I know about ensure to prevent updates to refs by other transactions during a particular transaction, but I don't believe that applies here. I don't care if ref1 changes as long as I see a value consistent with the start of my transaction.
How does this behaviour fit with the above referenced documentation?

It turns out that if the ref has some history it behaves as I expect, so changing the ref declaration to add a :min-history and then setting both refs as shown, seems to make it work....
(def r1 (ref 0 :min-history 5))
(def r2 (ref 0 :min-history 5))
(dosync
(ref-set r1 0)
(ref-set r2 0))
Then the output is:
S start
1 start
1 end
2 start
2 end
3 start
3 end
S r1= 0
S end
nil
Reading here, it's clear what's going on. The read transaction is restarting because there is no entry in the ref history from before the transaction started. To confim I added some more logging:
(defn deref-delay-deref [ref1 ref2 delay]
(.start
(Thread.
#((println "S start")
(dosync
(println "transaction starting")
(let [a #ref2]
(Thread/sleep delay)
(println "S r1=" #ref1))) ; should be consistent with #ref2
(println "S end")))))
Output without history mods:
S start
transaction starting
1 start
2 start
3 start
1 end
2 end
3 end
transaction starting
S r1= 3
S end
and with history mods:
S start
transaction starting
1 start
2 start
3 start
1 end
2 end
3 end
S r1= 0
S end
nil
UPDATE: It turns out my answer above is something of a distraction because of the artificial nature of the test case. In real world usage it doesn't matter whether the transaction re-starts or not, since the transactions MUST be written such that they are re-startable. The runtime provides no guarantees about whether read only transactions will complete in the presence/absence of history. Rather it can do whatever's necessary to get the world of transactions to complete and the transaction code MUST be written with this in mind. More detailed discussion here
I'm leaving the above for reference.

Related

What is the correct way to perform side effects in a clojure atom swap

I'm keeping a registry of processes in an atom.
I want to start one and only one process (specifically a core.async go-loop) per id.
However, you're not supposed to perform side-effects in a swap!, so this code is no good:
(swap! processes-atom
(fn [processes]
(if (get processes id)
processes ;; already exists, do nothing
(assoc processes id (create-process! id)))))
How would I go about doing this correctly?
I have looked at locking, which takes an object as a monitor for the lock. I would prefer that each id - which are dynamic - have their own lock.

It seems that you need to protect processes-atom from concurrent modification, so that only single thread can have access to it. locking will work in this case. Since, by usage of locking, we will manage thread safety by ourselves, we can use volatile instead of atom (volatile is faster, but doesn't provide any thread-safety and atomicity guaranees).
Summing up the above, something like below should work fine:
(def processes-volatile (volatile! {}))
(defn create-and-save-process! [id]
(locking processes-volatile
(vswap! processes-volatile
(fn [processes]
(if (get processes id)
processes
(assoc processes id (create-process! id)))))))

You can do this by hand with locking, as OlegTheCat shows, and often that is a fine approach. However, in the comments you remark that it would be nice to avoid having the whole atom locked for as long as it takes to spawn a process, and that too is possible in a surprisingly simple way: instead of having a map from pid to process, have a map from pid to delay of process. That way, you can add a new delay very cheaply, and only actually create the process by dereferencing the delay, outside of the call to swap!. Dereferencing the delay will block waiting for that particular delay, so multiple threads who need the same process will not step on each other's toes, but the atom itself will be unlocked, allowing threads who want a different process to get it.
Here is a sample implementation of that approach, along with example definitions of the other vars your question implies, to make the code runnable as-is:
(def process-results (atom []))
(defn create-process! [id]
;; pretend creating the process takes a long time
(Thread/sleep (* 1000 (rand-int 3)))
(future
;; running it takes longer, but happens on a new thread
(Thread/sleep (* 1000 (rand-int 10)))
(swap! process-results conj id)))
(def processes-atom (atom {}))
(defn cached-process [id]
(-> processes-atom
(swap! (fn [processes]
(update processes id #(or % (delay (create-process! id))))))
(get id)
(deref)))
Of course only cached-process is needed if you already have the other things defined. And a sample run, to show that processes are successfully reused:
(defn stress-test [num-processes]
(reset! process-results [])
(reset! processes-atom {})
(let [running-processes (doall (for [i (range num-processes)]
(cached-process (rand-int 10))))]
(run! deref running-processes)
(deref process-results)))
user> (time (stress-test 40))
"Elapsed time: 18004.617869 msecs"
[1 5 2 0 9 7 8 4 3 6]

I prefer using a channel
(defn create-process! [id] {:id id})
(def ^:private processes-channel (chan))
(go (loop [processes {}]
(let [id (<! processes-channel)
process (if (contains? processes id)
(get processes id)
(create-process! id))]
(>! processes-channel process)
(recur (assoc processes id process)))))
(defn get-process-by-id
"Public API"
[id]
(>!! processes-channel id)
(<!! processes-channel))

Another answer is to use an agent to start each process. This decouples each process from each other, and avoids the problem of possible multiple calls to the "create-process" function:
(defn start-proc-agent
[state]
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" (:id state)))
(Thread/sleep delay)
(println (format "finished %d" (:id state)))
(merge state {:delay delay :state :running} )))
(def procs-agent (atom {}))
(dotimes [i 3]
(let [curr-agent (agent {:id i :state :unstarted})]
(swap! procs-agent assoc i curr-agent)
(send curr-agent start-proc-agent )))
(println "all dispatched...")
(pprint #procs-agent)
(Thread/sleep 3000)
(pprint #procs-agent)
When run we see:
starting 2
starting 1
starting 0
all dispatched...
{0 #<Agent#39d8240b: {:id 0, :state :unstarted}>,
1 #<Agent#3a6732bc: {:id 1, :state :unstarted}>,
2 #<Agent#7414167a: {:id 2, :state :unstarted}>}
finished 0
finished 1
finished 2
{0 #<Agent#39d8240b: {:id 0, :state :running, :delay 317}>,
1 #<Agent#3a6732bc: {:id 1, :state :running, :delay 1635}>,
2 #<Agent#7414167a: {:id 2, :state :running, :delay 1687}>}
So the global map procs-agent associates each process ID with the agent for that process. A side benefit of this approach is that you can send subsequent commands (in the form of functions) to the agent for a process and be assured they are independent (and parallel & asynchronous) to every other agent.
Alternate solution
Similar to your original question, we could use a single agent (instead of an agent per process) to simply serialize the creation of each process. Since agents are asynchronous, they don't have the possibility of re-trying the input function like swap!. Thus, side-effecting functions aren't a problem. You could write it like so:
(defn start-proc-once-only
[state i]
(let [curr-proc (get state i) ]
(if (= :running (:state curr-proc))
(do
(println "skipping restart of" i)
state)
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" i))
(Thread/sleep delay)
(println (format "finished %d" i))
(assoc state i {:delay delay :state :running})))))
(def procs (agent {}))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(println "all dispatched...")
(println :procs) (pprint #procs)
(Thread/sleep 5000)
(println :procs) (pprint #procs)
with result
:starting 0
:starting 1
:starting 2
starting 0
:starting 0
:starting 1
:starting 2
all dispatched...
:procs
{}
finished 0
starting 1
finished 1
starting 2
finished 2
skipping restart of 0
skipping restart of 1
skipping restart of 2
:procs
{0 {:delay 1970, :state :running},
1 {:delay 189, :state :running},
2 {:delay 1337, :state :running}}

I think you should use add-watch. It gets called once per change to the atom. In the watch-fn check whether a new id has been added to the atom, if so, create the process and add it to the atom. That'll trigger another call to the watch-fn, but that second call won't identify any new id needing a process.

How do I undo or reverse a transaction in datomic?

I committed a transaction to datomic accidentally and I want to "undo" the whole transaction. I know exactly which transaction it is and I can see its datoms, but I don't know how to get from there to a rolled-back transaction.

The basic procedure:
Retrieve the datoms created in the transaction you want to undo. Use the transaction log to find them.
Remove datoms related to the transaction entity itself: we don't want to retract transaction metadata.
Invert the "added" state of all remaining datoms, i.e., if a datom was added, retract it, and if it was retracted, add it.
Reverse the order of the inverted datoms so the bad-new value is retracted before the old-good value is re-asserted.
Commit a new transaction.
In Clojure, your code would look like this:
(defn rollback
"Reassert retracted datoms and retract asserted datoms in a transaction,
effectively \"undoing\" the transaction.
WARNING: *very* naive function!"
[conn tx]
(let [tx-log (-> conn d/log (d/tx-range tx nil) first) ; find the transaction
txid (-> tx-log :t d/t->tx) ; get the transaction entity id
newdata (->> (:data tx-log) ; get the datoms from the transaction
(remove #(= (:e %) txid)) ; remove transaction-metadata datoms
; invert the datoms add/retract state.
(map #(do [(if (:added %) :db/retract :db/add) (:e %) (:a %) (:v %)]))
reverse)] ; reverse order of inverted datoms.
#(d/transact conn newdata))) ; commit new datoms.

This is not meant as an answer to the original question, but for those coming here from Google looking for inspiration for how to rollback a datascript transaction. I didn't find documentation about it, so I wrote my own:
(defn rollback
"Takes a transaction result and reasserts retracted
datoms and retracts asserted datoms, effectively
\"undoing\" the transaction."
[{:keys [tx-data]}]
; The passed transaction result looks something like
; this:
;
; {:db-before
; {1 :post/body,
; 1 :post/created-at,
; 1 :post/foo,
; 1 :post/id,
; 1 :post/title},
; :db-after {},
; :tx-data
; [#datascript/Datom [1 :post/body "sdffdsdsf" 536870914 false]
; #datascript/Datom [1 :post/created-at 1576538572631 536870914 false]
; #datascript/Datom [1 :post/foo "foo" 536870914 false]
; #datascript/Datom [1 :post/id #uuid "a21ad816-c509-42fe-a1b7-32ad9d3931ef" 536870914 false]
; #datascript/Datom [1 :post/title "123" 536870914 false]],
; :tempids {:db/current-tx 536870914},
; :tx-meta nil}))))
;
; We want to transform each datom into a new piece of
; a transaction. The last field in each datom indicates
; whether it was added (true) or retracted (false). To
; roll back the datom, this boolean needs to be inverted.
(let [t
(map
(fn [[entity-id attribute value _ added?]]
(if added?
[:db/retract entity-id attribute value]
[:db/add entity-id attribute value]))
tx-data)]
(transact t)))
You use it by first capturing a transaction's return value, then passing that return value to the rollback fn:
(let [tx (transact [...])]
(rollback tx))
Be careful though, I'm new to the datascript/Datomic world, so there might be something I am missing.

What does it mean for readers to have their own timeline

In his talk are we there yet, at 57:25, Rich Hickey talks about multiversion concurrency control. One of the advantages listed is the ability for readers to have their own timeline. I'm curious what this means in practice. Is this done by simply letting the reader save a history of observed values? Or is it somehow done with the help of of clojure's STM? It would be nice to see an example of how this is used in clojure.

I think Rich meant that readers outside a transaction see the world as it is in it's current state every time they try to read a value and this world view is individual for each of them.
When you've got two uncorrelated functions (not bound in the same transaction) trying to get current value of a variable (atom, ref, agent, etc...), they are not guaranteed to obtain (see) the same value.
Example:
(let [
; 1.
counter (ref 0)
; 2.
_ (.start (Thread. (fn [] (while (< #counter 1000000)
(dosync (alter counter inc))))))
_ (Thread/sleep 10)
; 3.
_ (let [r1 #counter
_ (Thread/sleep 1)
r2 #counter]
(println "free reader 1: " r1 "free reader 2:" r2))
; 4.
_ (dosync (let [r1 #counter
_ (Thread/sleep 1)
r2 #counter]
(println "frozen reader 1: " r1 "frozen reader 2:" r2)))
_ (println "---------------------------------")])
sample output:
free reader 1: 30573 free reader 2: 31295
snapshot reader 1: 105498 snapshot reader 2: 105498
---------------------------------
free reader 1: 37567 free reader 2: 38369
snapshot reader 1: 181392 snapshot reader 2: 181392
---------------------------------
free reader 1: 37317 free reader 2: 88570
frozen reader 1: 467471 frozen reader 2: 467471
---------------------------------
How it works:
Declare a counter variable as ref (transactional variable type) and set its initial value to 0
Create Java thread with anonymous function that changes the counter by incrementing it in a loop and ".start" the thread.
Read counter in a two step way, separated by 1 ms delay and print each value. As you can see both values are different as expected. This simulates separate time lines, when two objects observing the same object receive different data.
The same as above but in a world snapshot. Putting sleep function does not effect the output values. Both of them are equal.

Can I process an unrealized lazy-seq step by step

I have a lazy-seq where each item takes some time to calculate:
(defn gen-lazy-seq [size]
(for [i (range size)]
(do
(Thread/sleep 1000)
(rand-int 10))))
Is it possible to evaluate this sequence step by step and print the results. When I try to process it with for or doseq clojure always realizes the whole lazy-seq before printing anything out:
(doseq [item (gen-lazy-seq 10)]
(println item))
(for [item (gen-lazy-seq 10)]
(println item))
Both expressions will wait for 10 seconds before printing anything out. I have looked at doall and dorun as a solution, but they require that the lazy-seq producing function contain the println. I would like to define a lazy-seq producing function and lazy-seq printing function separately and make them work together item by item.
Motivation for trying to do this:
I have messages coming in over a network, and I want to start processing them before all have been received. At the same time it would be nice to save all messages corresponding to a query in a lazy-seq.
Edit 1:
JohnJ's answer shows how to create a lazy-seq that will be evaluated step by step. I would like to know how to evaluate any lazy-seq step by step.
I'm confused because running (chunked-seq? (gen-lazy-seq 10)) on gen-lazy-seq as defined above OR as defined in JohnJ's answer both return false. So then the problem can't be that one creates a chunked sequence and the other doesn't.
In this answer, a function seq1 which turns a chunked lazy-seq into a non-chunked one is shown. Trying that function still gives the same problem with delayed output. I thought that maybe the delay has to do with the some sort of buffering in the repl, so I tried to also print the time when each item in the seq is realized:
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
(let [start-time (java.lang.System/currentTimeMillis)]
(doseq [item (seq1 (gen-lazy-seq 10))]
(let [elapsed-time (- (java.lang.System/currentTimeMillis) start-time)]
(println "time: " elapsed-time "item: " item))))
; output:
time: 10002 item: 1
time: 10002 item: 8
time: 10003 item: 9
time: 10003 item: 1
time: 10003 item: 7
time: 10003 item: 2
time: 10004 item: 0
time: 10004 item: 3
time: 10004 item: 5
time: 10004 item: 0
Doing the same thing with JohnJ's version of gen-lazy-seq works as expected
; output:
time: 1002 item: 4
time: 2002 item: 1
time: 3002 item: 6
time: 4002 item: 8
time: 5002 item: 8
time: 6002 item: 4
time: 7002 item: 5
time: 8002 item: 6
time: 9003 item: 1
time: 10003 item: 4
Edit 2:
It's not only sequences generated with for which have this problem. This sequence generated with map cannot be processed step by step regardless of seq1 wrapping:
(defn gen-lazy-seq [size]
(map (fn [_]
(Thread/sleep 1000)
(rand-int 10))
(range 0 size)))
But this sequence, also created with map works:
(defn gen-lazy-seq [size]
(map (fn [_]
(Thread/sleep 1000)
(rand-int 10))
(repeat size :ignored)))

Clojure's lazy sequences are often chunked. You can see the chunking at work in your example if you take large sizes (it will be helpful to reduce the thread sleep time in this case). See also these related SO posts.
Though for seems to be chunked, the following is not and works as desired:
(defn gen-lazy-seq [size]
(take size (repeatedly #(do (Thread/sleep 1000)
(rand-int 10)))))
(doseq [item (gen-lazy-seq 10)]
(println item))
"I have messages coming in over a network, and I want to start processing them before all have been received." Chunked or no, this should actually be the case if you process them lazily.

Clojure dothreads! function

I am reading Fogus' book on Joy of Clojure and in the parallel programming chapter I saw a function definition which surely want to illustrate something important but I can't find out what. Moreover, I can't see what is this function for - when I execute, it doesn't do anything:
(import '(java.util.concurrent Executors))
(def *pool* (Executors/newFixedThreadPool
(+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [_ exec-count] (f)))))
I tried to run in this way:
(defn wait [] (Thread/sleep 1000))
(dothreads! wait :thread-count 10 :exec-count 10)
(dothreads! wait)
(dothreads! #(println "running"))
...but it returns nil. Why?

So, here's the same code, tweaked slightly so that the function passed to dothreads! gets passed the count of the inner dotimes.
(import 'java.util.concurrent.Executors)
(def ^:dynamic *pool* (Executors/newFixedThreadPool (+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c)))))
(defn hello [name]
(println "Hello " name))
Try running it like this:
(dothreads! hello :threads 2 :times 4)
For me, it prints something to the effect of:
Hello 0
Hello 1
Hello 2
Hello 3
nil
user=> Hello 0
Hello 1
Hello 2
Hello 3
So, note one mistake you made when calling the function: you passed in :thread-count and :exec-count as the keys whereas those are actually the bindings in the destructuring that's happening inside dothreads!. The keywords are the words starting with a colon, :threads and :times.
As to what this code actually does:
It creates a new fixed size thread pool that will use at most the
number of cores in your machine + 2. This pool is called *pool* and is created using the Java Executor Framework. See [1] for more details.
The dothreads! function gets a function that will be called exec-count times on each of the thread-count threads. So, in the example above, you can clearly see it being called 4 times per thread (:threads being 2 and :times being 4).
The reason why this function returns nil is that the function dothreads! doesn't return anything. The submit method of the thread pool returns void in Java and this means it returns nil in Clojure. If you were to add some other expression at the end of the function making it:
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c))))
(* thread-count exec-count))
It will return 8 for the example above (2 * 4). Only the last expression in the function is returned, so if in a function you were to write (fn [x y] (+ x y) (* x y)) this will always return the product. The sum will be evaluated, but it will be for nothing. So, don't do this! If you want add more that one expression to a function, make sure that all but the last one have side effects, otherwise they'll be useless.
You might also notice that the order in which stuff is printed is asynchronous. So, on my machine, it says hello 4 times, then returns the result of the function and then says hello 4 other times. The order in which the functions are executed is undetermined between threads, however the hellos are sequential in each thread (there can never be a Hello 3 before a Hello 2). The reason for the sequentiality is that the function actually submitted to the thread pools is #(dotimes [c exec-count] (f c)) and
[1] http://download.oracle.com/javase/tutorial/essential/concurrency/executors.html

It's used afterwards in the book to run test functions multiple times in multiple threads. It doesn't illustrate anything by itself, but it's used to demonstrate locking, promises, and other parallel and concurrent stuff.

dotimes, dothreads!, and println are not pure functions: they're used to introduce side-effects. For example,
user=> (println 3)
3
nil
That code snippet prints 3 to the screen, but returns nil. Similarly, dothreads! is useful for its side-effects and not its return value.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Are refs really consistent within a STM transaction? - clojure

Related

What is the correct way to perform side effects in a clojure atom swap

How do I undo or reverse a transaction in datomic?

What does it mean for readers to have their own timeline

Can I process an unrealized lazy-seq step by step

Clojure dothreads! function

Categories

Resources