Use of agents to complete side-effects in STM transactions

Use of agents to complete side-effects in STM transactions - concurrency

I'm aware that it is generally bad practice to put functions with side-effects within STM transactions, as they can potentially be retried and called multiple times.
It occurs to me however that you could use agents to ensure that the side effects get executed only after the transaction successfully completes.
e.g.
(dosync
// transactional stuff
(send some-agent #(function-with-side-effects params))
// more transactional stuff
)
Is this good practice?
What are the pros/cons/pitfalls?

Original:
Seems like that should work to me. Depending on what your side effects are, you might want to use send-off (for IO-bound ops) instead of send (for cpu-bound ops). The send/send-off will enqueue the task into one of the internal agent executor pools (there is a fixed size pool for cpu and unbounded size pool for io ops). Once the task is enqueued, the work is off the dosync's thread so you're disconnected at that point.
You'll need to capture any values you need from within the transaction into the sent function of course. And you need to deal with that send possibly occurring multiple times due to retries.
Update (see comments):
Agent sends within the ref's transaction are held until the ref transaction successfully completes and are executed once. So in my answer above, the send will NOT occur multiple times, however it won't occur during the ref transaction which may not be what you want (if you expect to log or do side-effecty stuff).

This works and is common practice. However, like Alex rightly pointed out you should consider send-off over send.
There are more ways to capture commited-values and hand them out of the transaction. For example you can return them in a vector (or a map or whatever).
(let [[x y z] (dosync
; do stuff
[#x #y #z])] ; values of interest to sode effects
(side-effect x y z))
or you can call reset! on a local atom (defined outside the lexical scope of the dosync block of course).

There's nothing wrong with using agents, but simply returning from the transaction values needed for the side-effecting computation is often sufficient.
Refs are probably the cleanest way to do this, but you can even manage it with just atoms!
(def work-queue-size (atom [0]))
(defn add-job [thunk]
(let [[running accepted?]
(swap! work-queue-size
(fn [[active]]
(if (< active 3)
[(inc active) true]
[active false])))]
(println
(str "Your job has been "
(if accepted?
"queued, and there are "
"rejected - there are already ")
running
" total running jobs"))))
The swap! can retry as many times as needed, but the work queue will never get larger than three, and you will always print exactly once a message that is tied correctly to the acceptance of your work item. The "original design" called for just a single int in the atom, but you can turn it into a pair in order to pass interesting data back out of the computation.

Related

Understanding STM properties in Clojure

I'm going through the book 7 concurrency models in 7 weeks. In it philosophers are represented as a number of ref's:
(def philosophers (into [] (repeatedly 5 #(ref :thinking))))
The state of each philosopher is flipped between :thinking and :eating using dosync transactions to ensure consistency.
Now I want to have a thread that outputs current status, so that I can be sure that the state is valid at all times:
(defn status-thread []
(Thread.
#(while true
(dosync
(println (map (fn [p] #p) philosophers))
(Thread/sleep 100)))))
We use multiple # to read values of each philosopher. It can happen that some refs are changed as we map over philosophers. Would it cause us to print inconsistent state although we don't have one?
I'm aware that Clojure uses MVCC to implement STM, but I'm not sure that I apply it correctly.
My transaction contains side effects and generally they should not appear inside a transaction. But in this case, transaction will always succeed and side effect should take place only once. Is it acceptable?

Your transaction doesn't really need a side effect, and if you scale the problem up enough I believe the transaction could fail for lack of history and retry the side effect if there's a lot of writing going on. I think the more appropriate way here would be to pull the dosync closer in. The transaction should be a pure, side-effect free fact finding mission. Once that has resulted in a value, you are then free to perform side effects with it without affecting the STM.
(defn status-thread []
(-> #(while true
(println (dosync (mapv deref philosophers)))
(Thread/sleep 100))
Thread.
.start)) ;;Threw in starting of the thread for my own testing
A few things I want to mention here:
# is a reader macro for the deref fn, so (fn [p] #p) is equivalent to just deref.
You should avoid laziness within transactions as some of the lazy values may be evaluated outside the context of the dosync or not at all. For mappings that means you can use e.g. doall, or like here just the eagerly evaluated mapv variant that makes a vector rather than a sequence.

This contingency was included in the STM design.
This problem is explicitly solved by combining agents with refs. refs guarantee that all messages set to agents in a transaction are sent exactly once and they are only sent when the transaction commits. If the transaction is retried then they will be dropped and not sent. When the transaction does eventually get through they will be sent at the moment the transaction commits.
(def watcher (agent nil))
(defn status-thread []
(future
(while true
(dosync
(send watcher (fn [_] (println (map (fn [p] #p) philosophers))))
(Thread/sleep 100)))))
The STM guarantees that your transaction will not be committed if the refs you deref during the transaction where changes in an incompatible way while it was running. You don't need to explicitly worry about derefing multiple refs in a transaction (that what the STM was made for)

What are the semantics of a clojure ref-set that doesn't "read" the ref?

I've read this SO question and http://clojure.org/refs, but I am still confused about how exactly ref-set works. (To some extent the two documents kind of lead me to believe two different things...)
Suppose that I have a transaction in Clojure that looks like this:
(def flag (ref false))
(dosync
(long-computation-that-does-not-read-or-write-flag)
(ref-set flag true))
Suppose that in the middle of the long computation, somebody else modifies flag. Will that cause my transaction to retry when I try to ref-set flag?
I could imagine the answer might be yes, since clojure.org says transactions guarantee that "No changes will have been made by any other transactions to any Refs that have been ref-set/altered/ensured by this transaction".
But I could also imagine the answer to be no, since I never read flag, and the clojure.org page suggests that "All *reads* of Refs will see a consistent snapshot of the 'Ref world' as of the starting point of the transaction". This is also what the linked SO answer would lead me to believe.
And a followup: supposing that instead of (ref-set flag true), I had done one of these:
(alter flag (fn [_] true))
(let [ignored #flag] (ref-set flag true))
I assume that both of those would constitute a read of flag, and so the transaction would have to retry?

Calling ref-set means that you have included flag in the tracked references for this transaction. Thus, a concurrent write to flag in some other transaction will cause a conflict and a retry.
Both of the followups modify flag (via alter and ref-set) and thus have the same result. The important thing here is not the read of flag, it's the write. If a transaction contains a read of a ref without a write, the transaction can succeed even if the read ref changes in a concurrent transaction. However, ensure can be used to include a read in the tracked references for a transaction (thus causing concurrent changes to fail).

Is there a bug in this clojure solution to sleeping barber?

This is presented as a solution to the sleeping barber problem. (Attributed to CGrand, but I found the reference here)
I'm curious about the dosync block in enter-the-shop. My understanding is that this is a transaction, and so empty-seats will remain consistent because of STM. However, isn't there the possibility of send-off being called multiple times if the transaction gets retried? If not, why, and if so, how would one resolve it?
UPDATE
While the accepted answer is still correct, one thing I just noticed is there's an optimization that could be made--there's no reason to call send-off inside the transaction. It can be sent afterwards once you have the return value of the transaction, as follows:
(if (dosync
(when (pos? #empty-seats)
(alter empty-seats dec)))
(send-off barber cut-hair n)
(debug "(s) turning away customer" n))
Interestingly I figured this out while working on the Haskell equivalent, which forces you to use different types for "agents" inside STM and outside STM. The original solution above wouldn't compile, as they had to be either both in a transaction or both outside any transaction. (My first reaction was to put them both inside the transaction, until I realized there was no need for this and they could both be extracted).
I think the modified transaction should be superior in that it closes the transaction faster, removes a variable from the transaction, and I think is easier to read (there's no need to even wonder about the possibility of it being sent twice--which actually makes this whole question moot) Still, I'll leave the question up anyway for anyone else who needs to know about how STM and agents interact.

Quoting the clojure.org page on agents:
Agents are integrated with the STM - any dispatches made in a transaction are held until it commits, and are discarded if it is retried or aborted.
So the send-off will only get run once, when(/if) the STM transaction is successfully committed.

what are some good programming principles for using nested dosync blocks?

I really like STMs, but am hoping to get some advice about how to use transactions properly, especially when one block of transactions depends on another
for example I have some code:
(defn unschedule-task [tt task-id]
(dosync
(doseq [entry .....]
(tk/kill-all! (:task entry)))
(v/delete! tt [[:task :id] task-id])))
(defn schedule-task [tt task schedule & [enabled? optt]]
(dosync
(unschedule-task tt (:id task))
(v/insert! tt {.....})))
Basically, unschedule-task has a dosync block, and schedule-task calls unschedule-task in its own dosync block as it needs both the deletion and the insertion to go through in one transaction.
How far can one push this and what are the pitfalls to avoid? (I'm thinking there may be issues with circular dependencies but can't think of an example off the top of my head....)

transactions are flattened; starting a new transaction during a transaction doesn't do anything. IOW, either all ref modification succeed during the outer transaction or the whole outer transaction is restarted. This means there should be no dependency issues.

Delayed evaluation in Clojure

I'm having some trouble understanding how the delay macro works in Clojure. It doesn't seem to do what expect it to do (that is: delaying evaluation). As you can see in this code sample:
; returns the current time
(defn get-timestamp [] (System/currentTimeMillis))
; var should contain the current timestamp after calling "force"
(def current-time (delay (get-timestamp)))
However, calling current-time in the REPL appears to immediately evaluate the expression, even without having used the force macro:
user=> current-time
#<Delay#19b5217: 1276376485859>
user=> (force current-time)
1276376485859
Why was the evaluation of get-timestamp not delayed until the first force call?

The printed representation of various objects which appears at the REPL is the product of a multimethod called print-method. It resides in the file core_print.clj in Clojure's sources, which constitutes part of what goes in the clojure.core namespace.
The problem here is that for objects implementing clojure.lang.IDeref -- the Java interface for things deref / # can operate on -- print-method includes the value behind the object in the printed representation. To this end, it needs to deref the object, and although special provisions are made for printing failed Agents and pending Futures, Delays are always forced.
Actually I'm inclined to consider this a bug, or at best a situation in need of an improvement. As a workaround for now, take extra care not to print unforced delays.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js