idiomatic file locking in clojure? - clojure

I have a group of futures processing jobs from a queue that involve writing to files. What's the idiomatic way to make sure only one future accesses a particular file at a time?

How about using agents instead of locks to ensure this?
I think using agents to safe guard shared mutable state, regardless if it's in memory or on disk is more idiomatic in clojure than using locks.
If you create one agent at a time and send the access tries to the agents, you can ensure that only on thread at time accesses a given file.
For example like this:
(use 'clojure.contrib.duck-streams)
(defn file-agent [file-name]
(add-watch (agent nil) :file-writer
(fn [key agent old new]
(append-spit file-name new))))
(defn async-append [file-agent content]
(send file-agent (constantly content)))
then append your file through the agent:
(async-append "content written to file" (file-agent "temp-file-name"))
If you need synchronous usage of the file it could be achieved with await. Like this:
(defn sync-append [file-agent content]
(await (send file-agent (constantly content))))

I would use the core Clojure function locking which is used as follows:
(locking some-object
(do-whatever-you-like))
Here some-object could either be the file itself, or alternatively any arbitrary object that you want to synchronise on (which might make sense if you wanted a single lock to protect multiple files).
Under the hood this uses standard JVM object locking, so it's basically equivalent to a synchronized block of code in Java.

I don't think there is specific built-in function for this in Clojure but you can use standard java IO functions to do this. This would look something like this:
(import '(java.io File RandomAccessFile))
(def f (File. "/tmp/lock.file"))
(def channel (.getChannel (RandomAccessFile. f "rw")))
(def lock (.lock channel))
(.release lock)
(.close channel)

Related

Streaming data to the caller in JVM

I have a function which gets data periodically and then stops getting the data. This function has to return the data that it is fetching periodically to the caller of the function either
As and when it gets
At one shot
The 2nd one is an easy implementation i.e you block the caller, fetch all the data and then send it in one shot.
But I want to implement the 1st one (I want to avoid having callbacks). Is streams the things to be used here? If so, how? If not, how do I return something on which the caller can query for data and also stop when it returns a signal that there is no more data?
Note: I am on the JVM ecosystem, clojure to be specific. I have had a look at the clojure library core.async which kind of solves this kind of a problem with the use of channels. But I was thinking if there is any other way which probably looks like this (assuming streams is something that can be used).
Java snippet
//Function which will periodically fetch MyData until there is no data
public Stream<MyData> myFunction() {
...
}
myFunction().filter(myData -> myData.text.equals("foo"))
Maybe you can just use seq - which is lazy by default (like Stream) so caller can decide when to pull the data in. And when there are no more data myFunction can simply end the sequence. While doing this, you would also encapsulate some optimisation within myFunction - e.g. to get data in batch to minimise roundtrips. Or fetch data periodically per your original requirement.
Here is one naive implementation:
(defn my-function []
(let [batch 100]
(->> (range)
(map #(let [from (* batch %)
to (+ from batch)]
(db-get from to)))
;; take while we have data from db-get
(take-while identity)
;; returns as one single seq/Stream
(apply concat))))
;; use it as a normal seq/Stream
(->> (my-function)
(filter odd?))
where db-get would be something like:
(defn db-get [from to]
;; return first 1000 records only, i.e. returns nil to signal completion
(when (< from 1000)
;; returns a range of records
(range from to)))
You might want to check https://github.com/ReactiveX/RxJava and https://github.com/ReactiveX/RxClojure (seems no longer maintained?)

Clojure: How to Serialize a Function and Reuse it Later

(defn my-func [opts]
(assoc opts :something :else))
What i want to be able to do, is serialize a reference to the function (maybe via #'my-func ?) to a string in such a way that i can upon deserializing it, invoke it with args.
How does this work?
Edit-- Why This is Not a Duplicate
The other question asked how to serialize a function body-- the entire function code. I am not asking how to do that. I am asking how to serialize a reference.
Imagine a cluster of servers all running the same jar, attached to a MQ. The MQ pubs in fn-reference and fn-args for functions in the jar, and the server in the cluster runs it and acks it. That's what i'm trying to do-- not pass function bodies around.
In some ways, this is like building a "serverless" engine in clojure.
Weirdly, a commit for serializing var identity was just added to Clojure yesterday: https://github.com/clojure/clojure/commit/a26dfc1390c53ca10dba750b8d5e6b93e846c067
So as of the latest master snapshot version, you can serialize a Var (like #'clojure.core/conj) and deserialize it on another JVM with access to the same loaded code, and invoke it.
(import [java.io File FileOutputStream FileInputStream ObjectOutputStream ObjectInputStream])
(defn write-obj [o f]
(let [oos (ObjectOutputStream. (FileOutputStream. (File. f)))]
(.writeObject oos o)
(.close oos)))
(defn read-obj [f]
(let [ois (ObjectInputStream. (FileInputStream. (File. f)))
o (.readObject ois)]
(.close ois)
o))
;; in one JVM
(write-obj #'clojure.core/conj "var.ser")
;; in another JVM
(read-obj "var.ser")
As suggested on the comments, if you can just serialize a keyword label for the function and store/retrieve that, you are finished.
If you need to transmit the function from one place to another, you essentially need to send the function source code as a string and then have it compiled via eval on the other end. This is what Datomic does when a Database Function is stored in the DB and automatically run by Datomic for any new additions/changes to the DB (these can perform automatic data validation, for example). See:
http://docs.datomic.com/database-functions.html
http://docs.datomic.com/clojure/index.html#datomic.api/function
As similar technique is used in the book Clojure in Action (1st Edition) for the distributed compute engine example using RabbitMQ.

A possible solution to avoid limitation in go macro: Must be called inside a (go ...) block

Perhaps a possible solution to use (<! c) outside go macro could be done with macro and its macro expansion time :
This is my example:
(ns fourclojure.asynco
(require [clojure.core.async :as async :refer :all]))
(defmacro runtime--fn [the-fn the-value]
`(~the-fn ~the-value)
)
(defmacro call-fn [ the-fn]
`(runtime--fn ~the-fn (<! my-chan))
)
(def my-chan (chan))
(defn read-channel [the-fn]
(go
(loop []
(call-fn the-fn)
(recur)
)
))
(defn paint []
(put! my-chan "paint!")
)
And to test it:
(read-channel print)
(repeatedly 50 paint)
I've tried this solution in a nested go and also works. But I'm not sure if it could be a correct path
The reason about this question is releated to this other question Isn't core.async contrary to Clojure principles?, #aeuhuea comment that "It seems to me that this prevents simplicity and composability. Why is it not a problem?" and #cgrand response "The limitation of the go macro (its locality) is also a feature: it enforces source code locality of stateful operations."
But force to localize your code is not the same as "complect"?
Regarding the title of your question:
>!must be called in a go block because it's designed to. If you are interested in the go-block state-machine mechanics, I can highly recommend Timothy Baldridges Youtube videos on that http://www.youtube.com/channel/UCLxWPHbkxjR-G-y6CVoEHOw
Remember that there is always blocking take and put >!! and <!!. I don't know which part of your code is supposed to provide a "solution" for not being able to use <! and >! outside of a go block, however looping around events dispatched from a single channel is common practice. Here is a modified version of read-channel
(defn do-channel [f ch]
(go-loop []
(when-let [v (<! ch)]
(f v)
(recur))))
put! puts asynchronously, an effect that you usually don't intend. In your example, to put the string "paint" into the channel 50 times, I'd recommend a one-liner like this one:
(do-channel println (to-chan (repeat 50 "print")))
Here is a comment as an answer to your edit:
Channels are not designed to be used as mutable data-structures, period. They have a buffer and that buffer can be thought of as a mutable queue. However we don't use channels to store a value in there, just to take it out a few lines later again.
We use channels as helping construct that may be used to bring execution of two or more different pieces of source-code in two or more different places in line. E.g. a go-block here does not continue to execute until it has received a value produced by another go-block. >! and >!! help us to distinguish whether they are used in a thread-blocking context or in a go-block (blocking a spawned process).
Also, please refer to this answer: Clojure - Why does execution hang when doing blocking insert into channel? (core.async)
You should not use >!! or <!! inside of a go-block, neither transparently or nested in a function call. Rich Hickey himself has commented on that in a recent bug report (http://dev.clojure.org/jira/browse/ASYNC-29?focusedCommentId=32414&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-32414).
Looking at the source-code of >! you will see that it only throws an exception. As a matter of fact, go will replace >! with different source-code. go spawns a state-machine controlled process. Depending on the context you may want to make this explicitly known or nest the go block inside of a macro or function (like in the code examples that you have provided).
Regarding David Nolens (swannodettes) helpers: They have been implemented by Rich Hickey and Nolen himself into the core.async library. Nolen said himself that they are superseded in this presentation (http://www.youtube.com/watch?v=AhxcGGeh5ho). Notice that go-loop has been implemented since after Nolens commit.

How should carmine's wcar macro be used?

I'm confused by how calls with carmine should be done. I found the wcar macro described in carmine's docs:
(defmacro wcar [& body] `(car/with-conn pool spec-server1 ~#body))
Do I really have to call wcar every time I want to talk to redis in addition to the redis command? Or can I just call it once at the beginning? If so how?
This is what some code with tavisrudd's redis library looked like (from my toy url shortener project's testsuite):
(deftest test_shorten_doesnt_exist_create_new_next
(redis/with-server test-server
(redis/set "url_counter" 51)
(shorten test-url)
(is (= "1g" (redis/get (str "urls|" test-url))))
(is (= test-url (redis/get "shorts|1g")))))
And now I can only get it working with carmine by writing it like this:
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(is (= "1g" (wcar (car/get (str "urls|" test-url)))))
(is (= test-url (wcar (car/get "shorts|1g")))))
So what's the right way of using it and what underlying concept am I not getting?
Dan's explanation is correct.
Carmine uses response pipelining by default, whereas redis-clojure requires you to ask for pipelining when you want it (using the pipeline macro).
The main reason you'd want pipelining is for performance. Redis is so fast that the bottleneck in using it is often the time it takes for the request+response to travel over the network.
Clojure destructuring provides a convenient way of dealing with the pipelined response, but it does require writing your code differently to redis-clojure. The way I'd write your example is something like this (I'm assuming your shorten fn has side effects and needs to be called before the GETs):
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(let [[response1 response2] (wcar (car/get (str "urls|" test-url))
(car/get "shorts|1g"))]
(is (= "1g" response1))
(is (= test-url response2))))
So we're sending the first (SET) request to Redis and waiting for the reply (I'm not certain if that's actually necessary here). We then send the next two (GET) requests at once, allow Redis to queue the responses, then receive them all back at once as a vector that we'll destructure.
At first this may seem like unnecessary extra effort because it requires you to be explicit about when to receive queued responses, but it brings a lot of benefits including performance, clarity, and composable commands.
I'd check out Touchstone on GitHub if you're looking for an example of what I'd consider idiomatic Carmine use (just search for the wcar calls). (Sorry, SO is preventing me from including another link).
Otherwise just pop me an email (or file a GitHub issue) if you have any other questions.
Don't worry, you're using it the correct way already.
The Redis request functions (such as the get and set that you're using above) are all routed through another function send-request! that relies on a dynamically bound *context* to provide the connection. Attempting to call any of these Redis commands without that context will fail with a "no context" error. The with-conn macro (used in wcar) sets that context and provides the connection.
The wcar macro is then just a thin wrapper around with-conn making the assumption that you will be using the same connection details for all Redis requests.
So far this is all very similar to how Tavis Rudd's redis-clojure works.
So, the question now is why does Carmine need multiple wcar's when Redis-Clojure only required a single with-server?
And the answer is, it doesn't. Apart from sometimes, when it does. Carmine's with-conn uses Redis's "Pipelining" to send multiple requests with the same connection and then package the responses together in a vector. The example from the README shows this in action.
(wcar (car/ping)
(car/set "foo" "bar")
(car/get "foo"))
=> ["PONG" "OK" "bar"]
Here you will see that ping, set and get are only concerned with sending the request, leaving the receiving of response up to wcar. This precludes asserts (or any result access) from inside of wcar and leads to the separation of requests and multiple wcar calls that you have.

How can I create a constantly running background process in Clojure?

How can I create a constantly running background process in Clojure? Is using "future" with a loop that never ends the right way?
You could just start a Thread with a function that runs forever.
(defn forever []
;; do stuff in a loop forever
)
(.start (Thread. forever))
If you don't want the background thread to block process exit, make sure to make it a daemon thread:
(doto
(Thread. forever)
(.setDaemon true)
(.start))
If you want some more finesse you can use the java.util.concurrent.Executors factory to create an ExecutorService. This makes it easy to create pools of threads, use custom thread factories, custom incoming queues, etc.
The claypoole lib wraps some of the work execution stuff up into a more clojure-friendly api if that's what you're angling towards.
My simple higher-order infinite loop function (using futures):
(def counter (atom 1))
(defn infinite-loop [function]
(function)
(future (infinite-loop function))
nil)
;; note the nil above is necessary to avoid overflowing the stack with futures...
(infinite-loop
#(do
(Thread/sleep 1000)
(swap! counter inc)))
;; wait half a minute....
#counter
=> 31
I strongly recommend using an atom or one of Clojures other reference types to store results (as per the counter in the example above).
With a bit of tweaking you could also use this approach to start/stop/pause the process in a thread-safe manner (e.g. test a flag to see if (function) should be executed in each iteration of the loop).
Maybe, or perhaps Lein-daemon? https://github.com/arohner/lein-daemon