Atom update hangs inside of Clojure watch call - clojure

I've got a situation where I watch a specific directory for filesystem changes. If a certain file in that directory is changed, I re-read it, attach some existing cached information, and store it in an atom.
The relevant code looks like
(def posts (atom []))
(defn load-posts! []
(swap!
posts
(fn [old]
(vec
(map #(let [raw (json/parse-string % (fn [k] (keyword (.toLowerCase k))))]
(<snip some processing of raw, including getting some pieces from old>))
(line-seq (io/reader "watched.json")))))))
;; elsewhere, inside of -main
(watch/start-watch
[{:path "resources/"
:event-types [:modify]
:callback (fn [event filename]
(when (and (= :modify event) (= "watched.json" filename))
(println "Reloading posts.json ...")
(posts/load-posts!)))}
...])
This ends up working fine locally, but when I deploy it to my server, the swap! call hangs about half-way through.
I've tried debugging it via println, which told me
The filesystem trigger is being fired.
swap! is not running the function more than once
The watched file is being opened and parsed
Some entries from the file are being processed, but that processing stops at entry 111 (which doesn't seem to be significantly different from any preceding entries).
The update does not complete, and the old value of that atom is therefore preserved
No filesystem events are fired after this one hangs.
I suspect that this is either a memory issue somewhere, or possibly a bug in Clojure-Watch (or the underlying FS-watching library).
Any ideas how I might go about fixing it or diagnosing it further?

The hang is caused by an error being thrown inside of the function passed as a :callback to watch/start.
The root cause in this case is that the modified file is being copied to the server by scp (which is not atomic, and the first event therefore triggers before the copy is complete, which is what causes the JSON parse error to be thrown).
This is exacerbated by the fact that watch/start fails silently if its :callback throws any kind of error.
The solutions here are
Use rsync to copy files. It does copy atomically but it will not generate any :modify events on the target file, only related temp-files. Because of the way its atomic copy works, it will only signal :create events.
Wrap the :callback in a try/catch, and have the catch clause return the old value of the atom. This will cause load-posts! to run multiple times, but the last time will be on file copy completion, which should finally do the right thing.
(I've done both, but either would have realistically solved the problem).
A third option would be using an FS-watching library that reports errors, such as Hawk or dirwatch (or possibly hara.io.watch? I haven't used any of these, so I can't comment).
Diagnosing this involved wrapping the :callback body with
(try
<body>
(catch Exception e
(println "ERROR IN SWAP!" e)
old))
to see what was actually being thrown. Once that printed a JSON parsing error, it was pretty easy to gain a theory of what was going wrong.

Related

clojure future vs delay

I have been reading through Programming in Clojure and found thing text
(defn get-document [id]
; ... do some work to retrieve the identified document's metadata ... {:url "http://www.mozilla.org/about/manifesto.en.html"
:title "The Mozilla Manifesto"
:mime "text/html"
:content (delay (slurp "http://www.mozilla.org/about/manifesto.en.html"))})
if callers are likely to always require that data, the change of replacing future over delay can prove to be a significant improvement in throughput.
I didn't got this part completely, can someone please explain a bit.
Simple answer future is background execution of body, delay is on-demand execution of body. Example: if you have list of 100 delay-ed code, and trying to loop through it - code will block while evaluating each list item (e.g. doing HTTP request) and first iteration will be slow. Same with future-d code - it'll evaluate all content in background thread(s) and results will be available instantly in your loop.
Rule of thumb - if there is good chance that some or most of content will not be needed at all - use delay, otherwise use future.
https://clojuredocs.org/clojure.core/delay
https://clojuredocs.org/clojure.core/future
future creates a Future and schedules it for execution immediately, therefore calling
(get-document "id")
will cause a future to be created which fetches the document immediately and then caches the result in the future.
In contrast, delay creates a lazy operation which will not be executed until dereferenced. In this case, calling
(get-document "id")
will not cause the document to be fetched. This will only happen when dereferencing e.g.
(let [{:keys [content]} (get-document "id")]
(println #content))

Update clojure code running in a thread

Now, I need some kind of code reloading in the clojure/java mixing system.
I know tools.namespace lib can be used to refresh clj code. But when I try to reload code which is used in a thread, it is not updated.
This is the example clojure code I need to update: (just switch between upper case and lower case.)
(ns com.test.test-util)
(defn get-word [sentence]
(let [a-word (rand-nth (clojure.string/split sentence #"\s"))
;;a-word (clojure.string/upper-case a-word)
]
(println a-word)
a-word))
I used a future to start a thread and call the code above.
In main method, in my first test, I just create a future and refresh the code after a while. But it is not updated. So, I also tested to create another thread after refresh. But the code is not updated.
(defn start-job []
(future (
(println "start worker job.")
(while #job-cond
(do (get-word (rand-nth sentences))
(Thread/sleep 2000)) )
(println "return!")
)))
(defn -main []
(let [work-job (start-job)]
(println "modify clj file...")
(Thread/sleep 10000)
(swap! job-cond not)
(future-cancel work-job)
(swap! job-cond not)
(refresh)
(Thread/sleep 3000)
(let [new-work-job (start-job)]
(Thread/sleep 10000)
(swap! job-cond not)
(future-cancel new-work-job))))
Thanks.
In general you don't want code-reloading to affect runtime. Rather you should try to seek for clean ways to boot and shutdown the running application code using e. g. Stuart Sierras library component or juxt/jig by Malcolm Sparks. Then reload namespaces only when your application is not up and running (or at least not the parts of running code that depend on namespaces being reloaded).
That being said, here is why your example does not work:
start-job can not automatically refer to a reloaded version of get-word in its lifetime because the version of get-work it uses is resolved when it is compiled. So it will still use the old version.
Explicitly resolve the var of get-word using
(resolve `get-word)
and invoke the returned var as a function instead of using get-word directly in the future.
What will happen then?
After you made the changes and refresh has been invoked, -main will still refer to the old-version of start-job (so if you change that, it will not show effect in runtime). However within that get-word will be resolved as described above and the new version of get-word will be used.
Short explanation: You can't expect a running compiled function to replace itself with reloaded code on the fly (no matter whether it runs on a separate thread or not). That would (unless you are using resolve in it) be a necessity though for it to invoke reloaded functions.

A possible solution to avoid limitation in go macro: Must be called inside a (go ...) block

Perhaps a possible solution to use (<! c) outside go macro could be done with macro and its macro expansion time :
This is my example:
(ns fourclojure.asynco
(require [clojure.core.async :as async :refer :all]))
(defmacro runtime--fn [the-fn the-value]
`(~the-fn ~the-value)
)
(defmacro call-fn [ the-fn]
`(runtime--fn ~the-fn (<! my-chan))
)
(def my-chan (chan))
(defn read-channel [the-fn]
(go
(loop []
(call-fn the-fn)
(recur)
)
))
(defn paint []
(put! my-chan "paint!")
)
And to test it:
(read-channel print)
(repeatedly 50 paint)
I've tried this solution in a nested go and also works. But I'm not sure if it could be a correct path
The reason about this question is releated to this other question Isn't core.async contrary to Clojure principles?, #aeuhuea comment that "It seems to me that this prevents simplicity and composability. Why is it not a problem?" and #cgrand response "The limitation of the go macro (its locality) is also a feature: it enforces source code locality of stateful operations."
But force to localize your code is not the same as "complect"?
Regarding the title of your question:
>!must be called in a go block because it's designed to. If you are interested in the go-block state-machine mechanics, I can highly recommend Timothy Baldridges Youtube videos on that http://www.youtube.com/channel/UCLxWPHbkxjR-G-y6CVoEHOw
Remember that there is always blocking take and put >!! and <!!. I don't know which part of your code is supposed to provide a "solution" for not being able to use <! and >! outside of a go block, however looping around events dispatched from a single channel is common practice. Here is a modified version of read-channel
(defn do-channel [f ch]
(go-loop []
(when-let [v (<! ch)]
(f v)
(recur))))
put! puts asynchronously, an effect that you usually don't intend. In your example, to put the string "paint" into the channel 50 times, I'd recommend a one-liner like this one:
(do-channel println (to-chan (repeat 50 "print")))
Here is a comment as an answer to your edit:
Channels are not designed to be used as mutable data-structures, period. They have a buffer and that buffer can be thought of as a mutable queue. However we don't use channels to store a value in there, just to take it out a few lines later again.
We use channels as helping construct that may be used to bring execution of two or more different pieces of source-code in two or more different places in line. E.g. a go-block here does not continue to execute until it has received a value produced by another go-block. >! and >!! help us to distinguish whether they are used in a thread-blocking context or in a go-block (blocking a spawned process).
Also, please refer to this answer: Clojure - Why does execution hang when doing blocking insert into channel? (core.async)
You should not use >!! or <!! inside of a go-block, neither transparently or nested in a function call. Rich Hickey himself has commented on that in a recent bug report (http://dev.clojure.org/jira/browse/ASYNC-29?focusedCommentId=32414&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-32414).
Looking at the source-code of >! you will see that it only throws an exception. As a matter of fact, go will replace >! with different source-code. go spawns a state-machine controlled process. Depending on the context you may want to make this explicitly known or nest the go block inside of a macro or function (like in the code examples that you have provided).
Regarding David Nolens (swannodettes) helpers: They have been implemented by Rich Hickey and Nolen himself into the core.async library. Nolen said himself that they are superseded in this presentation (http://www.youtube.com/watch?v=AhxcGGeh5ho). Notice that go-loop has been implemented since after Nolens commit.

Clojure / Jetty: Force URL to only be Hit Once at a Time

I am working on a Clojure / Jetty web service. I have a special url that I want to only be serviced one request at a time. If the url was requested, and before it returns, the url is requested again, I want to immediately return. So in more core.clj, where I defined my routes, I have something like this:
(def work-in-progress (ref false))
Then sometime later
(compojure.core/GET "/myapp/internal/do-work" []
(if #work-in-progress
"Work in Progress please try again later"
(do
(dosync
(ref-set work-in-progress true))
(do-the-work)
(dosync
(ref-set rebuild-in-progress false))
"Job completed Successfully")))
I have tried this on local Jetty server but I seem to be able to hit the url twice and double the work. What is a good pattern / way to implement this in Clojure in a threaded web server environment?
Imagine a following race condition for the solution proposed in the question.
Thread A starts to execute handler's body. #work-in-progress is false, so it enters the do expression. However, before it managed to set the value of work-in-progress to true...
Thread B starts to execute handler's body. #work-in-progress is false, so it enters the do expression.
Now two threads are executing (do-the-work) concurrently. That's not what we want.
To prevent this problem check and set the value of the ref in a dosync transaction.
(compojure.core/GET "/myapp/internal/do-work" []
(if (dosync
(when-not #work-in-progress
(ref-set work-in-progress true)))
(try
(do-the-work)
"Job completed Successfully"
(finally
(dosync
(ref-set work-in-progress false))))
"Work in Progress please try again later"))
Another abstraction which you might find useful in this scenario is an atom and compare-and-set!.
(def work-in-progress (atom false))
(compojure.core/GET "/myapp/internal/do-work" []
(if (compare-and-set! work-in-progress false true)
(try
(do-the-work)
"Job completed Successfully"
(finally
(reset! work-in-progress false)))
"Work in Progress please try again later"))
Actually this is the natural use case for a lock; in particular, a java.util.concurrent.locks.ReentrantLock.
The same pattern came up in my answer to an earlier SO question, Canonical Way to Ensure Only One Instance of a Service Is Running / Starting / Stopping in Clojure?; I'll repeat the relevant piece of code here:
(import java.util.concurrent.locks.ReentrantLock)
(def lock (ReentrantLock.))
(defn start []
(if (.tryLock lock)
(try
(do-stuff)
(finally (.unlock lock)))
(do-other-stuff)))
The tryLock method attempts to acquire the lock, returning true if it succeeds in doing so and false otherwise, not blocking in either case.
Consider queueing the access to the resource as well - in addition to getting an equivalent functionality to that of locks/flags, queues let you observe the resource contention, among other advantages.

How should carmine's wcar macro be used?

I'm confused by how calls with carmine should be done. I found the wcar macro described in carmine's docs:
(defmacro wcar [& body] `(car/with-conn pool spec-server1 ~#body))
Do I really have to call wcar every time I want to talk to redis in addition to the redis command? Or can I just call it once at the beginning? If so how?
This is what some code with tavisrudd's redis library looked like (from my toy url shortener project's testsuite):
(deftest test_shorten_doesnt_exist_create_new_next
(redis/with-server test-server
(redis/set "url_counter" 51)
(shorten test-url)
(is (= "1g" (redis/get (str "urls|" test-url))))
(is (= test-url (redis/get "shorts|1g")))))
And now I can only get it working with carmine by writing it like this:
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(is (= "1g" (wcar (car/get (str "urls|" test-url)))))
(is (= test-url (wcar (car/get "shorts|1g")))))
So what's the right way of using it and what underlying concept am I not getting?
Dan's explanation is correct.
Carmine uses response pipelining by default, whereas redis-clojure requires you to ask for pipelining when you want it (using the pipeline macro).
The main reason you'd want pipelining is for performance. Redis is so fast that the bottleneck in using it is often the time it takes for the request+response to travel over the network.
Clojure destructuring provides a convenient way of dealing with the pipelined response, but it does require writing your code differently to redis-clojure. The way I'd write your example is something like this (I'm assuming your shorten fn has side effects and needs to be called before the GETs):
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(let [[response1 response2] (wcar (car/get (str "urls|" test-url))
(car/get "shorts|1g"))]
(is (= "1g" response1))
(is (= test-url response2))))
So we're sending the first (SET) request to Redis and waiting for the reply (I'm not certain if that's actually necessary here). We then send the next two (GET) requests at once, allow Redis to queue the responses, then receive them all back at once as a vector that we'll destructure.
At first this may seem like unnecessary extra effort because it requires you to be explicit about when to receive queued responses, but it brings a lot of benefits including performance, clarity, and composable commands.
I'd check out Touchstone on GitHub if you're looking for an example of what I'd consider idiomatic Carmine use (just search for the wcar calls). (Sorry, SO is preventing me from including another link).
Otherwise just pop me an email (or file a GitHub issue) if you have any other questions.
Don't worry, you're using it the correct way already.
The Redis request functions (such as the get and set that you're using above) are all routed through another function send-request! that relies on a dynamically bound *context* to provide the connection. Attempting to call any of these Redis commands without that context will fail with a "no context" error. The with-conn macro (used in wcar) sets that context and provides the connection.
The wcar macro is then just a thin wrapper around with-conn making the assumption that you will be using the same connection details for all Redis requests.
So far this is all very similar to how Tavis Rudd's redis-clojure works.
So, the question now is why does Carmine need multiple wcar's when Redis-Clojure only required a single with-server?
And the answer is, it doesn't. Apart from sometimes, when it does. Carmine's with-conn uses Redis's "Pipelining" to send multiple requests with the same connection and then package the responses together in a vector. The example from the README shows this in action.
(wcar (car/ping)
(car/set "foo" "bar")
(car/get "foo"))
=> ["PONG" "OK" "bar"]
Here you will see that ping, set and get are only concerned with sending the request, leaving the receiving of response up to wcar. This precludes asserts (or any result access) from inside of wcar and leads to the separation of requests and multiple wcar calls that you have.