clojure future vs delay - clojure

I have been reading through Programming in Clojure and found thing text
(defn get-document [id]
; ... do some work to retrieve the identified document's metadata ... {:url "http://www.mozilla.org/about/manifesto.en.html"
:title "The Mozilla Manifesto"
:mime "text/html"
:content (delay (slurp "http://www.mozilla.org/about/manifesto.en.html"))})
if callers are likely to always require that data, the change of replacing future over delay can prove to be a significant improvement in throughput.
I didn't got this part completely, can someone please explain a bit.

Simple answer future is background execution of body, delay is on-demand execution of body. Example: if you have list of 100 delay-ed code, and trying to loop through it - code will block while evaluating each list item (e.g. doing HTTP request) and first iteration will be slow. Same with future-d code - it'll evaluate all content in background thread(s) and results will be available instantly in your loop.
Rule of thumb - if there is good chance that some or most of content will not be needed at all - use delay, otherwise use future.
https://clojuredocs.org/clojure.core/delay
https://clojuredocs.org/clojure.core/future

future creates a Future and schedules it for execution immediately, therefore calling
(get-document "id")
will cause a future to be created which fetches the document immediately and then caches the result in the future.
In contrast, delay creates a lazy operation which will not be executed until dereferenced. In this case, calling
(get-document "id")
will not cause the document to be fetched. This will only happen when dereferencing e.g.
(let [{:keys [content]} (get-document "id")]
(println #content))

Related

Atom update hangs inside of Clojure watch call

I've got a situation where I watch a specific directory for filesystem changes. If a certain file in that directory is changed, I re-read it, attach some existing cached information, and store it in an atom.
The relevant code looks like
(def posts (atom []))
(defn load-posts! []
(swap!
posts
(fn [old]
(vec
(map #(let [raw (json/parse-string % (fn [k] (keyword (.toLowerCase k))))]
(<snip some processing of raw, including getting some pieces from old>))
(line-seq (io/reader "watched.json")))))))
;; elsewhere, inside of -main
(watch/start-watch
[{:path "resources/"
:event-types [:modify]
:callback (fn [event filename]
(when (and (= :modify event) (= "watched.json" filename))
(println "Reloading posts.json ...")
(posts/load-posts!)))}
...])
This ends up working fine locally, but when I deploy it to my server, the swap! call hangs about half-way through.
I've tried debugging it via println, which told me
The filesystem trigger is being fired.
swap! is not running the function more than once
The watched file is being opened and parsed
Some entries from the file are being processed, but that processing stops at entry 111 (which doesn't seem to be significantly different from any preceding entries).
The update does not complete, and the old value of that atom is therefore preserved
No filesystem events are fired after this one hangs.
I suspect that this is either a memory issue somewhere, or possibly a bug in Clojure-Watch (or the underlying FS-watching library).
Any ideas how I might go about fixing it or diagnosing it further?
The hang is caused by an error being thrown inside of the function passed as a :callback to watch/start.
The root cause in this case is that the modified file is being copied to the server by scp (which is not atomic, and the first event therefore triggers before the copy is complete, which is what causes the JSON parse error to be thrown).
This is exacerbated by the fact that watch/start fails silently if its :callback throws any kind of error.
The solutions here are
Use rsync to copy files. It does copy atomically but it will not generate any :modify events on the target file, only related temp-files. Because of the way its atomic copy works, it will only signal :create events.
Wrap the :callback in a try/catch, and have the catch clause return the old value of the atom. This will cause load-posts! to run multiple times, but the last time will be on file copy completion, which should finally do the right thing.
(I've done both, but either would have realistically solved the problem).
A third option would be using an FS-watching library that reports errors, such as Hawk or dirwatch (or possibly hara.io.watch? I haven't used any of these, so I can't comment).
Diagnosing this involved wrapping the :callback body with
(try
<body>
(catch Exception e
(println "ERROR IN SWAP!" e)
old))
to see what was actually being thrown. Once that printed a JSON parsing error, it was pretty easy to gain a theory of what was going wrong.

Managing and finding variables in Clojure REPL

I was looking at https://github.com/juxt/dirwatch library. The example from the front page is:
(require '[juxt.dirwatch :refer (watch-dir)])
(watch-dir println (clojure.java.io/file "/tmp"))
That works fine. Let's say the above is executed in REPL:
user=> (watch-dir println (clojure.java.io/file "/tmp"))
#<Agent#16824c93: #<LinuxWatchService sun.nio.fs.LinuxWatchService#17ece9ac>>
Now, I have an agent that will print events when I modify files in /tmp:
{:file #<File /tmp/1>, :count 1, :action :modify}
so all is fine.
I know I can reference the agent by using previous expression references (*1, *2 and *3). However, I don't know how to, without restarting the REPL itself:
Unbind an implicit var created like this - i.e. how to remove the binding completely, so that agent gets GCed and stops working
Access it in case I lost it in cases where I did not bind it, such as the above. If I'm not mistaken, in REPL only the last three results are available (*3 is, but *4 and further are not), at least per http://clojure.org/repl_and_main
Any suggestions?
Did you take a look at the code? The documentation to watch-dir has this: "The watcher returned by this function is a resource which
should be closed with close-watcher."
Looking at the code, it watch-dir uses send-off, which "Dispatch a potentially blocking action to an agent. Returns the agent immediately.". In other words, to address your first question, there is no implicit var created. If you want to get rid of the agent, you should bind the returned agent to some var and call close-watcher on it afterwards.
To address the second question, take a look at the canonical documentation for agents. Specifically, you can call shutdown-agents, which will shut-down the thread pool (potentially killing other agents as well).

Future in closure not firing (Clojure)

I have a closure in which a future takes a do block. Each function inside the do block is provided by the arguments of the closure:
(defn accept-order
[persist record track notify log]
(fn [sponsor order]
(let [datetime (to-timestamp (local-now))
order (merge order {:network_reviewed_at datetime
:workflow_state "unconfirmed"
:sponsor_id (:id sponsor)})]
(future
(do
(persist order
(select-keys order [:network_reviewed_at
:workflow_state
:sponsor_id]))
(record sponsor order true)
(track)
(notify sponsor order)
(log sponsor order)))
order)))
No function in the do block is fired. If I deref the future, it works. If I remove the future it works. If I run from a REPL, it works. But if I run lein test, it won't work.
Any ideas? Thank you!
Adding a (Thread/sleep 2000) to a test invoking your function causes the future to run, so I'd venture a guess that Leiningen is killing the VM before your future gets to run (or at least before it manages to cause its side effects). Leiningen does kill the VM immediately after running tests.
As a side note, you don't need the do. future takes a body, not a single expression.

Clojure / Jetty: Force URL to only be Hit Once at a Time

I am working on a Clojure / Jetty web service. I have a special url that I want to only be serviced one request at a time. If the url was requested, and before it returns, the url is requested again, I want to immediately return. So in more core.clj, where I defined my routes, I have something like this:
(def work-in-progress (ref false))
Then sometime later
(compojure.core/GET "/myapp/internal/do-work" []
(if #work-in-progress
"Work in Progress please try again later"
(do
(dosync
(ref-set work-in-progress true))
(do-the-work)
(dosync
(ref-set rebuild-in-progress false))
"Job completed Successfully")))
I have tried this on local Jetty server but I seem to be able to hit the url twice and double the work. What is a good pattern / way to implement this in Clojure in a threaded web server environment?
Imagine a following race condition for the solution proposed in the question.
Thread A starts to execute handler's body. #work-in-progress is false, so it enters the do expression. However, before it managed to set the value of work-in-progress to true...
Thread B starts to execute handler's body. #work-in-progress is false, so it enters the do expression.
Now two threads are executing (do-the-work) concurrently. That's not what we want.
To prevent this problem check and set the value of the ref in a dosync transaction.
(compojure.core/GET "/myapp/internal/do-work" []
(if (dosync
(when-not #work-in-progress
(ref-set work-in-progress true)))
(try
(do-the-work)
"Job completed Successfully"
(finally
(dosync
(ref-set work-in-progress false))))
"Work in Progress please try again later"))
Another abstraction which you might find useful in this scenario is an atom and compare-and-set!.
(def work-in-progress (atom false))
(compojure.core/GET "/myapp/internal/do-work" []
(if (compare-and-set! work-in-progress false true)
(try
(do-the-work)
"Job completed Successfully"
(finally
(reset! work-in-progress false)))
"Work in Progress please try again later"))
Actually this is the natural use case for a lock; in particular, a java.util.concurrent.locks.ReentrantLock.
The same pattern came up in my answer to an earlier SO question, Canonical Way to Ensure Only One Instance of a Service Is Running / Starting / Stopping in Clojure?; I'll repeat the relevant piece of code here:
(import java.util.concurrent.locks.ReentrantLock)
(def lock (ReentrantLock.))
(defn start []
(if (.tryLock lock)
(try
(do-stuff)
(finally (.unlock lock)))
(do-other-stuff)))
The tryLock method attempts to acquire the lock, returning true if it succeeds in doing so and false otherwise, not blocking in either case.
Consider queueing the access to the resource as well - in addition to getting an equivalent functionality to that of locks/flags, queues let you observe the resource contention, among other advantages.

How should carmine's wcar macro be used?

I'm confused by how calls with carmine should be done. I found the wcar macro described in carmine's docs:
(defmacro wcar [& body] `(car/with-conn pool spec-server1 ~#body))
Do I really have to call wcar every time I want to talk to redis in addition to the redis command? Or can I just call it once at the beginning? If so how?
This is what some code with tavisrudd's redis library looked like (from my toy url shortener project's testsuite):
(deftest test_shorten_doesnt_exist_create_new_next
(redis/with-server test-server
(redis/set "url_counter" 51)
(shorten test-url)
(is (= "1g" (redis/get (str "urls|" test-url))))
(is (= test-url (redis/get "shorts|1g")))))
And now I can only get it working with carmine by writing it like this:
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(is (= "1g" (wcar (car/get (str "urls|" test-url)))))
(is (= test-url (wcar (car/get "shorts|1g")))))
So what's the right way of using it and what underlying concept am I not getting?
Dan's explanation is correct.
Carmine uses response pipelining by default, whereas redis-clojure requires you to ask for pipelining when you want it (using the pipeline macro).
The main reason you'd want pipelining is for performance. Redis is so fast that the bottleneck in using it is often the time it takes for the request+response to travel over the network.
Clojure destructuring provides a convenient way of dealing with the pipelined response, but it does require writing your code differently to redis-clojure. The way I'd write your example is something like this (I'm assuming your shorten fn has side effects and needs to be called before the GETs):
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(let [[response1 response2] (wcar (car/get (str "urls|" test-url))
(car/get "shorts|1g"))]
(is (= "1g" response1))
(is (= test-url response2))))
So we're sending the first (SET) request to Redis and waiting for the reply (I'm not certain if that's actually necessary here). We then send the next two (GET) requests at once, allow Redis to queue the responses, then receive them all back at once as a vector that we'll destructure.
At first this may seem like unnecessary extra effort because it requires you to be explicit about when to receive queued responses, but it brings a lot of benefits including performance, clarity, and composable commands.
I'd check out Touchstone on GitHub if you're looking for an example of what I'd consider idiomatic Carmine use (just search for the wcar calls). (Sorry, SO is preventing me from including another link).
Otherwise just pop me an email (or file a GitHub issue) if you have any other questions.
Don't worry, you're using it the correct way already.
The Redis request functions (such as the get and set that you're using above) are all routed through another function send-request! that relies on a dynamically bound *context* to provide the connection. Attempting to call any of these Redis commands without that context will fail with a "no context" error. The with-conn macro (used in wcar) sets that context and provides the connection.
The wcar macro is then just a thin wrapper around with-conn making the assumption that you will be using the same connection details for all Redis requests.
So far this is all very similar to how Tavis Rudd's redis-clojure works.
So, the question now is why does Carmine need multiple wcar's when Redis-Clojure only required a single with-server?
And the answer is, it doesn't. Apart from sometimes, when it does. Carmine's with-conn uses Redis's "Pipelining" to send multiple requests with the same connection and then package the responses together in a vector. The example from the README shows this in action.
(wcar (car/ping)
(car/set "foo" "bar")
(car/get "foo"))
=> ["PONG" "OK" "bar"]
Here you will see that ping, set and get are only concerned with sending the request, leaving the receiving of response up to wcar. This precludes asserts (or any result access) from inside of wcar and leads to the separation of requests and multiple wcar calls that you have.