Interop from clojure a with non-standard iterative java API - clojure

I am working in clojure with a java class which provides a retrieval API for a domain specific binary file holding a series of records.
The java class is initialized with a file and then provides a .query method which returns an instance of an inner class which has only one method .next, thus not playing nicely with the usual java collections API. Neither the outer nor inner class implements any interface.
The .query method may return null instead of the inner class. The .next method returns a record string or null if no further records are found, it may return null immediately upon the first call.
How do I make this java API work well from within clojure without writing further java classes?
The best I could come up with is:
(defn get-records
[file query-params]
(let [tr (JavaCustomFileReader. file)]
(if-let [inner-iter (.query tr query-params)] ; .query may return null
(loop [it inner-iter
results []]
(if-let [record (.next it)]
(recur it (conj results record))
results))
[])))
This gives me a vector of results to work with the clojure seq abstractions. Are there other ways to expose a seq from the java API, either with lazy-seq or using protocols?

Without dropping to lazy-seq:
(defn record-seq
[q]
(take-while (complement nil?) (repeatedly #(.next q))))
Instead of (complement nil?) you could also just use identity if .next does not return boolean false.
(defn record-seq
[q]
(take-while identity (repeatedly #(.next q))))
I would also restructure a little bit the entry points.
(defn query
[rdr params]
(when-let [q (.query rdr params)]
(record-seq q)))
(defn query-file
[file params]
(with-open [rdr (JavaCustomFileReader. file)]
(doall (query rdr params))))

Seems like a good fit for lazy-seq:
(defn query [file query]
(.query (JavaCustomFileReader. file) query))
(defn record-seq [query]
(when query
(when-let [v (.next query)]
(cons v (lazy-seq (record-seq query))))))
;; usage:
(record-seq (query "filename" "query params"))

Your code is not lazy as it would be if you were using Iterable but you can fill the gap with lazy-seq as follows.
(defn query-seq [q]
(lazy-seq
(when-let [val (.next q)]
(cons val (query-seq q)))))
Maybe you shoul wrap the query method to protect yourself from the first null value as well.

Related

Improving complex data structure replacement

I'm attempting to modify a specific field in a data structure, described below (a filled example can be found here:
[{:fields "There are a few other fields here"
:incidents [{:fields "There are a few other fields here"
:updates [{:fields "There are a few other fields here"
:content "THIS is the field I want to replace"
:translations [{:based_on "Based on the VALUE of this"
:content "Replace with this value"}]}]}]}]
I already have this implemented it in a number of functions, as below:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(vec (map translate-update updates)))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(vec (map translate-incident incidents)))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(vec (map translate-service services)))
Each array could have any number of entries (though the number is likely less than 10).
The basic premise is to replace the :content in each :update with the relevant :translation based on a provided value.
My Clojure knowledge is limited, so I'm curious if there is a more optimal way to achieve this?
EDIT
Solution so far:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(mapv translate-update updates))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(mapv translate-incident incidents))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(mapv translate-service services))
I would start more or less as you do, bottom-up, by defining some functions that look like they will be useful: how to choose a translation from among a list of translations, and how to apply that choice to an update. But I wouldn't make the functions so tiny as yours: the logic is all spread out into a lot of places, and it's not easy to get an overall idea of what is going on. Here are the two functions I'd start with:
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
...)
Of course you can defn them instead if you'd like, and I suspect many people would, but they're only going to be used in one place, and they're intimately involved with the context in which they're used, so I like a letfn. These two functions are really all the interesting logic; the rest is just some boring tree-traversal code to apply this logic in the right places.
Now to build out the body of the letfn is straightforward, and easy to read if you make your code be the same shape as the data it manipulates. We want to walk through a series of nested lists, updating objects on the way, and so we just write a series of nested for comprehensions, calling update to descend into the right keyspace:
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))
I think using for here is miles better than using map, although of course they are equivalent as always. The important difference is that as you read through the code you see the new context first ("okay, now we're doing something to each user"), and then what is happening inside that context; with map you see them in the other order and it is difficult to keep tack of what is happening where.
Combining these, and putting them into a defn, we get a function that you can call with your example input and which produces your desired output (assuming a suitable definition of get-locale):
(defn translate [users]
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))))
we can try to find some patterns in this task (based on the contents of the snippet from github gist, you've posted):
simply speaking, you need to
1) update every item (A) in vector of data
2) updating every item (B) in vector of A's :incidents
3) updating every item (C) in vector of B's :updates
4) translating C
The translate function could look like this:
(defn translate [{translations :translations :as item} locale]
(assoc item :content
(or (some #(when (= (:locale %) locale) (:content %)) translations)
:no-translation-found)))
it's usage (some fields are omitted for brevity):
user> (translate {:id 1
:content "abc"
:severity "101"
:translations [{:locale "fr_FR"
:content "abc"}
{:locale "ru_RU"
:content "абв"}]}
"ru_RU")
;;=> {:id 1,
;; :content "абв",
;; :severity "101",
;; :translations [{:locale "fr_FR", :content "abc"} {:locale "ru_RU", :content "абв"}]}
then we can see that 1 and 2 are totally similar, so we can generalize that:
(defn update-vec-of-maps [data k f]
(mapv (fn [item] (update item k f)) data))
using it as a building block you can make up the whole data transformation:
(defn transform [data locale]
(update-vec-of-maps
data :incidents
(fn [incidents]
(update-vec-of-maps
incidents :updates
(fn [updates] (mapv #(translate % locale) updates))))))
(transform data "it_IT")
returns what you need.
then you can generalize it further, making the utility function for arbitrary depth transformations:
(defn deep-update-vec-of-maps [data ks terminal-fn]
(if (seq ks)
((reduce (fn [f k] #(update-vec-of-maps % k f))
terminal-fn (reverse ks))
data)
data))
and use it like this:
(deep-update-vec-of-maps data [:incidents :updates]
(fn [updates]
(mapv #(translate % "it_IT") updates)))
I recommend you look at https://github.com/nathanmarz/specter
It makes it really easy to read and update clojure data structures. Same performance as hand-written code, but much shorter.

Core.async: Take all values from collection of promise-chans

Consider a dataset like this:
(def data [{:url "http://www.url1.com" :type :a}
{:url "http://www.url2.com" :type :a}
{:url "http://www.url3.com" :type :a}
{:url "http://www.url4.com" :type :b}])
The contents of those URL's should be requested in parallel. Depending on the item's :type value those contents should be parsed by corresponding functions. The parsing functions return collections, which should be concatenated, once all the responses have arrived.
So let's assume that there are functions parse-a and parse-b, which both return a collection of strings when they are passed a string containing HTML content.
It looks like core.async could be a good tool for this. One could either have separate channels for each item ore one single channel. I'm not sure which way would be preferable here. With several channels one could use transducers for the postprocessing/parsing. There is also a special promise-chan which might be proper here.
Here is a code-sketch, I'm using a callback based HTTP kit function. Unfortunately, I could not find a generic solution inside the go block.
(defn f [data]
(let [chans (map (fn [{:keys [url type]}]
(let [c (promise-chan (map ({:a parse-a :b parse-b} type)))]
(http/get url {} #(put! c %))
c))
data)
result-c (promise-chan)]
(go (put! result-c (concat (<! (nth chans 0))
(<! (nth chans 1))
(<! (nth chans 2))
(<! (nth chans 3)))))
result-c))
The result can be read like so:
(go (prn (<! (f data))))
I'd say that promise-chan does more harm than good here. The problem is that most of core.async API (a/merge, a/reduce etc.) relies on fact that channels will close at some point, promise-chans in turn never close.
So, if sticking with core.async is crucial for you, the better solution will be not to use promise-chan, but ordinary channel instead, which will be closed after first put!:
...
(let [c (chan 1 (map ({:a parse-a :b parse-b} type)))]
(http/get url {} #(do (put! c %) (close! c)))
c)
...
At this point, you're working with closed channels and things become a bit simpler. To collect all values you could do something like this:
;; (go (put! result-c (concat (<! (nth chans 0))
;; (<! (nth chans 1))
;; (<! (nth chans 2))
;; (<! (nth chans 3)))))
;; instead of above, now you can do this:
(->> chans
async/merge
(async/reduce into []))
UPD (below are my personal opinions):
Seems, that using core.async channels as promises (either in form of promise-chan or channel that closes after single put!) is not the best approach. When things grow, it turns out that core.async API overall is (you may have noticed that) not that pleasant as it could be. Also there are several unsupported constructs, that may force you to write less idiomatic code than it could be. In addition, there is no built-in error handling (if error occurs within go-block, go-block will silently return nil) and to address this you'll need to come up with something of your own (reinvent the wheel). Therefore, if you need promises, I'd recommend to use specific library for that, for example manifold or promesa.
I wanted this functionality as well because I really like core.async but I also wanted to use it in certain places like traditional JavaScript promises. I came up with a solution using macros. In the code below, <? is the same thing as <! but it throws if there's an error. It behaves like Promise.all() in that it returns a vector of all the returned values from the channels if they all are successful; otherwise it will return the first error (since <? will cause it to throw that value).
(defmacro <<? [chans]
`(let [res# (atom [])]
(doseq [c# ~chans]
(swap! res# conj (serverless.core.async/<? c#)))
#res#))
If you'd like to see the full context of the function it's located on GitHub. It's heavily inspired from David Nolen's blog post.
Use pipeline-async in async.core to launch asynchronous operations like http/get concurrently while delivering the result in the same order as the input:
(let [result (chan)]
(pipeline-async
20 result
(fn [{:keys [url type]} ch]
(let [parse ({:a parse-a :b parse-b} type)
callback #(put! ch (parse %)(partial close! ch))]
(http/get url {} callback)))
(to-chan data))
result)
if anyone is still looking at this, adding on to the answer by #OlegTheCat:
You can use a separate channel for errors.
(:require [cljs.core.async :as async]
[cljs-http.client :as http])
(:require-macros [cljs.core.async.macros :refer [go]])
(go (as-> [(http/post <url1> <params1>)
(http/post <url2> <params2>)
...]
chans
(async/merge chans (count chans))
(async/reduce conj [] chans)
(async/<! chans)
(<callback> chans)))

Return a unique channel in clojurescript

Goal: Construct a ClojureScript function that takes a string s and returns the unique channel with the name (str s "-chan") (if the channel doesn't exist, then create it). Here is my attempt:
(defn string-channel
[s]
(let [chan-name (symbol (str s "-chan"))]
(defonce chan-name (chan))
chan-name))
This yields an error. How do I accomplish this goal? Note that since I'm in ClojureScript, I am unable to use the eval construct if the solution involves a macro.
I would rather propose to keep these channels in an atom (since defining vars dynamically seems really needless here). In addition, keeping channels in one place seems to me more manageable.
(def channels (atom {}))
(defn string-channel [s]
(when-not (#channels s)
(swap! channels assoc s (chan)))
(#channels s))
It turns out the right approach was to use a defmacro:
(defmacro string-channel
[s]
`(do
(defonce ~(symbol (str s "-chan")) (chan))
~(symbol (str s "-chan"))))
Consider using memoize if the intent is to ensure a unique channel for each label s:
(def unique-channel (memoize (fn [s] (chan))))

Is there a Clojure idiom for dispatching multiple expressions in parallel

I have a number of (unevaluated) expressions held in a vector; [ expr1 expr2 expr3 ... ]
What I wish to do is hand each expression to a separate thread and wait until one returns a value. At that point I'm not interested in the results from the other threads and would like to cancel them to save CPU resource.
( I realise that this could cause non-determinism in that different runs of the program might cause different expressions to be evaluated first. I have this in hand. )
Is there a standard / idiomatic way of achieving the above?
Here's my take on it.
Basically you have to resolve a global promise inside each of your futures, then return a vector containing future list and the resolved value and then cancel all the futures in the list:
(defn run-and-cancel [& expr]
(let [p (promise)
run-futures (fn [& expr] [(doall (map #(future (deliver p (eval %1))) expr)) #p])
[fs res] (apply run-futures expr)]
(map future-cancel fs)
res))
It's not reached an official release yet, but core.async looks like it might be an interesting way of solving your problem - and other asynchronous problems, very neatly.
The leiningen incantation for core.async is (currently) as follows:
[org.clojure/core.async "0.1.0-SNAPSHOT"]
And here's some code to make a function that will take a number of time-consuming functions, and block until one of them returns.
(require '[clojure.core.async :refer [>!! chan alts!! thread]]))
(defn return-first [& ops]
(let [v (map vector ops (repeatedly chan))]
(doseq [[op c] v]
(thread (>!! c (op))))
(let [[value channel] (alts!! (map second v))]
value)))
;; Make sure the function returns what we expect with a simple Thread/sleep
(assert (= (return-first (fn [] (Thread/sleep 3000) 3000)
(fn [] (Thread/sleep 2000) 2000)
(fn [] (Thread/sleep 5000) 5000))
2000))
In the sample above:
chan creates an asynchronous channel
>!! puts a value onto a channel
thread executes the body in another thread
alts!! takes a vector of channels, and returns when a value appears on any of them
There's way more to it than this, and I'm still getting my head round it, but there's a walkthrough here: https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
And David Nolen's blog has some great, if mind-boggling, posts on it (http://swannodette.github.io/)
Edit
Just seen that Michał Marczyk has answered a very similar question, but better, here, and it allows you to cancel/short-circuit.
with Clojure threading long running processes and comparing their returns
What you want is Java's CompletionService. I don't know of any wrapper around this in clojure, but it wouldn't be hard to do with interop. The example below is loosely based around the example on the JavaDoc page for the ExecutorCompletionService.
(defn f [col]
(let [cs (ExecutorCompletionService. (Executors/newCachedThreadPool))
futures (map #(.submit cs %) col)
result (.get (.take cs))]
(map #(.cancel % true) futures)
result))
You could use future-call to get a list of all futures, storing them in an Atom. then, compose each running future with a "shoot the other ones in the head" function so the first one will terminate all the remaining ones. Here is an example:
(defn first-out [& fns]
(let [fs (atom [])
terminate (fn [] (println "cancling..") (doall (map future-cancel #fs)))]
(reset! fs (doall (map (fn [x] (future-call #((x) (terminate)))) fns)))))
(defn wait-for [n s]
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush)))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
Edit
Just noticed that the previous code does not return the first results, so it is mainly useful for side-effects. here is another version that returns the first result using a promise:
(defn first-out [& fns]
(let [fs (atom [])
ret (promise)
terminate (fn [x] (println "cancling.." )
(doall (map future-cancel #fs))
(deliver ret x))]
(reset! fs (doall (map (fn [x] (future-call #(terminate (x)))) fns)))
#ret))
(defn wait-for [n s]
"this time, return the value"
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush) s))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
While I don't know if there is an idiomatic way to achieve your goal but Clojure Future looks like a good fit.
Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/#. If the computation has
not yet finished, calls to deref/# will block, unless the variant of
deref with timeout is used.

Idiomatic way of finding functions in a namesspace containing specific metadata?

I'm trying to figure out the best way to troll a namespace for functions that contain a specific bit of metadata. I've come up with a solution, but it feels a little awkward and I'm not at all sure I'm going about it the right way. There's a second component to this as well: I don't just want the names of the functions, I want to find them and then execute them. Here's a snippet of what I'm doing presently:
(defn wrap-routes
[req from-ns]
(let [publics (ns-publics from-ns)
routes (->>
(keys publics)
(map #(meta (% publics)))
(filter #(= (:route-handler %) true))
(map #(:name %)))
resp (first
(->>
(map #((% publics) req) routes)
(filter #(:status %))))]
(or resp not-found)))
As you can see, I'm doing all sorts of gymnastics to see if my metadata is attached to any functions in a given namespace and then am doing extra work after that to get the actual function back. I'm sure there must be a better way. So my question is, how would you do this?
(defn wrap-routes [req from-ns]
(or (first (filter :status
(for [[name f] (ns-publics from-ns)
:when (:route-handler (meta f))]
(f req))))
not-found))
You can do something like this:
(defn wrap-routes
[req from-ns]
(->> (ns-publics from-ns)
(filter #(:route-handler (meta (%1 1))))
(map #((%1 1) req))
(filter #(:status %))
first
(#(or % not-found))))