Is it sensible to memoize promises? - clojure

I'm using HTTP Kit to make requests, and I want them to be asynchronous, but I also want to cache the responses. The reason I want the requests to be asynchronous is that I am making several concurrently and I want them to operate in parallel.
Here's my function that makes several requests in parallel.
(defn parallel-requests-1 [urls]
(let [; Dispatch all requests to run concurrently.
responses (doall (map #(http/get %) urls))
; Realise all promises.
realised (doall (map deref responses))
; Extract response body.
bodies (map :body realised)]
bodies))
(parallel-requests-1 ["http://example.com", "http://example.net"])
This is purely for illustrative purposes to demonstrate that I don't want to just deref the promise and memoize that.
Now I want to add caching using memoize. I tried this:
(def memoized-get (memoize http/get))
(defn parallel-requests-2 [urls]
(let [; Dispatch all requests to run concurrently.
responses (doall (map #(memoized-get %) urls))
; Realise all promises.
realised (doall (map deref responses))
; Extract response body.
bodies (map :body realised)]
bodies))
All the signs show that this works well.
Is this a sensible solution? My concern is that caching a promise might constitute some kind of resource leak.

Yes, it is sensible to memoize a promise. In fact, it's probably the most sensible solution in this situation. After all, that's what http/get returns, and that is the function you want to memoize.
As far as resource leakage, that shouldn't be a problem any more than usual whether you memoize the promise or its value. A promise is basically just some state that references value once it's deliver!ed. So you have a slight overhead for this reference, but it's minuscule in comparison to the whole response.

Related

Understanding STM properties in Clojure

I'm going through the book 7 concurrency models in 7 weeks. In it philosophers are represented as a number of ref's:
(def philosophers (into [] (repeatedly 5 #(ref :thinking))))
The state of each philosopher is flipped between :thinking and :eating using dosync transactions to ensure consistency.
Now I want to have a thread that outputs current status, so that I can be sure that the state is valid at all times:
(defn status-thread []
(Thread.
#(while true
(dosync
(println (map (fn [p] #p) philosophers))
(Thread/sleep 100)))))
We use multiple # to read values of each philosopher. It can happen that some refs are changed as we map over philosophers. Would it cause us to print inconsistent state although we don't have one?
I'm aware that Clojure uses MVCC to implement STM, but I'm not sure that I apply it correctly.
My transaction contains side effects and generally they should not appear inside a transaction. But in this case, transaction will always succeed and side effect should take place only once. Is it acceptable?
Your transaction doesn't really need a side effect, and if you scale the problem up enough I believe the transaction could fail for lack of history and retry the side effect if there's a lot of writing going on. I think the more appropriate way here would be to pull the dosync closer in. The transaction should be a pure, side-effect free fact finding mission. Once that has resulted in a value, you are then free to perform side effects with it without affecting the STM.
(defn status-thread []
(-> #(while true
(println (dosync (mapv deref philosophers)))
(Thread/sleep 100))
Thread.
.start)) ;;Threw in starting of the thread for my own testing
A few things I want to mention here:
# is a reader macro for the deref fn, so (fn [p] #p) is equivalent to just deref.
You should avoid laziness within transactions as some of the lazy values may be evaluated outside the context of the dosync or not at all. For mappings that means you can use e.g. doall, or like here just the eagerly evaluated mapv variant that makes a vector rather than a sequence.
This contingency was included in the STM design.
This problem is explicitly solved by combining agents with refs. refs guarantee that all messages set to agents in a transaction are sent exactly once and they are only sent when the transaction commits. If the transaction is retried then they will be dropped and not sent. When the transaction does eventually get through they will be sent at the moment the transaction commits.
(def watcher (agent nil))
(defn status-thread []
(future
(while true
(dosync
(send watcher (fn [_] (println (map (fn [p] #p) philosophers))))
(Thread/sleep 100)))))
The STM guarantees that your transaction will not be committed if the refs you deref during the transaction where changes in an incompatible way while it was running. You don't need to explicitly worry about derefing multiple refs in a transaction (that what the STM was made for)

Call a side effecting function only when atom value changes

What is the simplest way to trigger a side-effecting function to be called only when an atom's value changes?
If I were using a ref, I think I could just do this:
(defn transform-item [x] ...)
(defn do-side-effect-on-change [] nil)
(def my-ref (ref ...))
(when (dosync (let [old-value #my-ref
_ (alter! my-ref transform-item)
new-value #my-ref]
(not= old-value new-value)))
(do-side-effect-on-change))
But this seems seems a bit roundabout, since I'm using a ref even though I am not trying to coordinate changes across multiple refs. Essentially I am using it just to conveniently access the old and new value within a successful transaction.
I feel like I should be able to use an atom instead. Is there a solution simpler than this?
(def my-atom (atom ...))
(let [watch-key ::side-effect-watch
watch-fn (fn [_ _ old-value new-value]
(when (not= old-value new-value)
(do-side-effect-on-change)))]
(add-watch my-atom watch-key watch-fn)
(swap! my-atom transform-item)
(remove-watch watch-key))
This also seems roundabout, because I am adding and removing the watch around every call to swap!. But I need this, because I don't want a watch hanging around that causes the side-effecting function to be triggered when other code modifies the atom.
It is important that the side-effecting function be called exactly once per mutation to the atom, and only when the transform function transform-item actually returns a new value. Sometimes it will return the old value, yielding new change.
(when (not= #a (swap! a transform))
(do-side-effect))
But you should be very clear about what concurrency semantics you need. For example another thread may modify the atom between reading it and swapping it:
a = 1
Thread 1 reads a as 1
Thread 2 modifies a to 2
Thread 1 swaps a from 2 to 2
Thread 1 determines 1 != 2 and calls do-side-effect
It is not clear to me from the question whether this is desirable or not desirable. If you do not want this behavior, then an atom just will not do the job unless you introduce concurrency control with a lock.
Seeing as you started with a ref and asked about an atom, I think you have probably given some thought to concurrency already. It seems like from your description the ref approach is better:
(when (dosync (not= #r (alter r transform))
(do-side-effect))
Is there a reason you don't like your ref solution?
If the answer is "because I don't have concurrency" Then I would encourage you to use a ref anyway. There isn't really a downside to it, and it makes your semantics explicit. IMO programs tend to grow and to a point where concurrency exists, and Clojure is really great at being explicit about what should happen when it exists. (For example oh I'm just calculating stuff, oh I'm just exposing this stuff as a web service now, oh now I'm concurrent).
In any case, bear in mind that functions like alter and swap! return the value, so you can make use of this for concise expressions.
I'm running into the same situation and just come up 2 solutions.
state field :changed?
Keeping a meanless :changed mark in atom to track swap function. And take the return value of swap! to see if things changed. For example:
(defn data (atom {:value 0 :changed? false}))
(let [{changed? :changed?} (swap! data (fn [data] (if (change?)
{:value 1 :changed? true}
{:value 0 :change? false})))]
(when changed? (do-your-task)))
exception based
You can throw an Exception in swap function, and catch it outside:
(try
(swap! data (fn [d] (if (changed?) d2 (ex-info "unchanged" {})))
(do-your-task)
(catch Exception _
))

couldn't use for loop in go block of core.async?

I'm new to clojure core.async library, and I'm trying to understand it through experiment.
But when I tried:
(let [i (async/chan)] (async/go (doall (for [r [1 2 3]] (async/>! i r)))))
it gives me a very strange exception:
CompilerException java.lang.IllegalArgumentException: No method in multimethod '-item-to-ssa' for dispatch value: :fn
and I tried another code:
(let [i (async/chan)] (async/go (doseq [r [1 2 3]] (async/>! i r))))
it have no compiler exception at all.
I'm totally confused. What happend?
So the Clojure go-block stops translation at function boundaries, for many reasons, but the biggest is simplicity. This is most commonly seen when constructing a lazy seq:
(go (lazy-seq (<! c)))
Gets compiled into something like this:
(go (clojure.lang.LazySeq. (fn [] (<! c))))
Now let's think about this real quick...what should this return? Assuming what you probably wanted was a lazy seq containing the value taken from c, but the <! needs to translate the remaining code of the function into a callback, but LazySeq is expecting the function to be synchronous. There really isn't a way around this limitation.
So back to your question if, you macroexpand for you'll see that it doesn't actually loop, instead it expands into a bunch of code that eventually calls lazy-seq and so parking ops don't work inside the body. doseq (and dotimes) however are backed by loop/recur and so those will work perfectly fine.
There are a few other places where this might trip you up with-bindings being one example. Basically if a macro sticks your core.async parking operations into a nested function, you'll get this error.
My suggestion then is to keep the body of your go blocks as simple as possible. Write pure functions, and then treat the body of go blocks as the places to do IO.
------------ EDIT -------------
By stops translation at function boundaries, I mean this: the go block takes its body and translates it into a state-machine. Each call to <! >! or alts! (and a few others) are considered state machine transitions where the execution of the block can pause. At each of those points the machine is turned into a callback and attached to the channel. When this macro reaches a fn form it stops translating. So you can only make calls to <! from inside a go block, not inside a function inside a code block.
This is part of the magic of core.async. Without the go macro, core.async code would look a lot like callback-hell in other langauges.

Clojure confusion - behavior of map, doseq in a multiprocess environment

In trying to replicate some websockets examples I've run into some behavior I don't understand and can't seem to find documentation for. Simplified, here's an example I'm running in lein that's supposed to run a function for every element in a shared map once per second:
(def clients (atom {"a" "b" "c" "d" }))
(def ticker-agent (agent nil))
(defn execute [a]
(println "execute")
(let [ keys (keys #clients) ]
(println "keys= " keys )
(doseq [ x keys ] (println x)))
;(map (fn [k] (println k)) keys)) ;; replace doseq with this?
(Thread/sleep 1000)
(send *agent* execute))
(defn -main [& args]
(send ticker-agent execute)
)
If I run this with map I get
execute
keys= (a c)
execute
keys= (a c)
...
First confusing issue: I understand that I'm likely using map incorrectly because there's no return value, but does that mean the inner println is optimized away? Especially given that if I run this in a repl:
(map #(println %) '(1 2 3))
it works fine?
Second question - if I run this with doseq instead of map I can run into conditions where the execution agent stops (which I'd append here, but am having difficulty isolating/recreating). Clearly there's something I"m missing possibly relating to locking on the maps keyset? I was able to do this even moving the shared map out of an atom. Is there default syncrhonization on the clojure map?
map is lazy. This means that it does not calculate any result until the result is accessed from the data structure it reteruns. This means that it will not run anything if its result is not used.
When you use map from the repl the print stage of the repl accesses the data, which causes any side effects in your mapped function to be invoked. Inside a function, if the return value is not investigated, any side effects in the mapping function will not occur.
You can use doall to force full evaluation of a lazy sequence. You can use dorun if you don't need the result value but want to ensure all side effects are invoked. Also you can use mapv which is not lazy (because vectors are never lazy), and gives you an associative data structure, which is often useful (better random access performance, optimized for appending rather than prepending).
Edit: Regarding the second part of your question (moving this here from a comment).
No, there is nothing about doseq that would hang your execution, try checking the agent-error status of your agent to see if there is some exception, because agents stop executing and stop accepting new tasks by default if they hit an error condition. You can also use set-error-model and set-error-handler! to customize the agent's error handling behavior.

Best Practice for globals in clojure, (refs vs alter-var-root)?

I've found myself using the following idiom lately in clojure code.
(def *some-global-var* (ref {}))
(defn get-global-var []
#*global-var*)
(defn update-global-var [val]
(dosync (ref-set *global-var* val)))
Most of the time this isn't even multi-threaded code that might need the transactional semantics that refs give you. It just feels like refs are for more than threaded code but basically for any global that requires immutability. Is there a better practice for this? I could try to refactor the code to just use binding or let but that can get particularly tricky for some applications.
I always use an atom rather than a ref when I see this kind of pattern - if you don't need transactions, just a shared mutable storage location, then atoms seem to be the way to go.
e.g. for a mutable map of key/value pairs I would use:
(def state (atom {}))
(defn get-state [key]
(#state key))
(defn update-state [key val]
(swap! state assoc key val))
Your functions have side effects. Calling them twice with the same inputs may give different return values depending on the current value of *some-global-var*. This makes things difficult to test and reason about, especially once you have more than one of these global vars floating around.
People calling your functions may not even know that your functions are depending on the value of the global var, without inspecting the source. What if they forget to initialize the global var? It's easy to forget. What if you have two sets of code both trying to use a library that relies on these global vars? They are probably going to step all over each other, unless you use binding. You also add overheads every time you access data from a ref.
If you write your code side-effect free, these problems go away. A function stands on its own. It's easy to test: pass it some inputs, inspect the outputs, they'll always be the same. It's easy to see what inputs a function depends on: they're all in the argument list. And now your code is thread-safe. And probably runs faster.
It's tricky to think about code this way if you're used to the "mutate a bunch of objects/memory" style of programming, but once you get the hang of it, it becomes relatively straightforward to organize your programs this way. Your code generally ends up as simple as or simpler than the global-mutation version of the same code.
Here's a highly contrived example:
(def *address-book* (ref {}))
(defn add [name addr]
(dosync (alter *address-book* assoc name addr)))
(defn report []
(doseq [[name addr] #*address-book*]
(println name ":" addr)))
(defn do-some-stuff []
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))
Looking at do-some-stuff in isolation, what the heck is it doing? There are a lot of things happening implicitly. Down this path lies spaghetti. An arguably better version:
(defn make-address-book [] {})
(defn add [addr-book name addr]
(assoc addr-book name addr))
(defn report [addr-book]
(doseq [[name addr] addr-book]
(println name ":" addr)))
(defn do-some-stuff []
(let [addr-book (make-address-book)]
(-> addr-book
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))))
Now it's clear what do-some-stuff is doing, even in isolation. You can have as many address books floating around as you want. Multiple threads could have their own. You can use this code from multiple namespaces safely. You can't forget to initialize the address book, because you pass it as an argument. You can test report easily: just pass the desired "mock" address book in and see what it prints. You don't have to care about any global state or anything but the function you're testing at the moment.
If you don't need to coordinate updates to a data structure from multiple threads, there's usually no need to use refs or global vars.