Idiomatic error/exception handling with threading macros - clojure

I'm fetching thousands of entities from an API one at a time using http requests. As next step in the pipeline I want to shovel all of them into a database.
(->> ids
(pmap fetch-entity)
(pmap store-entity)
(doall))
fetch-entity expects a String id and tries to retrieve an entity using an http request and either returns a Map or throws an exception (e.g. because of a timeout).
store-entity expects a Map and tries to store it in a database. It possibly throws an exception (e.g. if the Map doesn't match the database schema or if it didn't receive a Map at all).
Inelegant Error Handling
My first "solution" was to write wrapper functions fetch-entity' and store-entity' to catch exceptions of their respective original functions.
fetch-entity' returns its input on failure, basically passing along a String id if the http request failed. This ensures that the whole pipeline keeps on trucking.
store-entity' checks the type of its argument. If the argument is a Map (fetch entity was successful and returned a Map) it attempts to store it in the database.
If the attempt of storing to the database throws an exception or if store-entity' got passed a String (id) instead of a Map it will conj to an external Vector of error_ids.
This way I can later use error_ids to figure out how often there was a failure and which ids were affected.
It doesn't feel like the above is a sensible way to achieve what I'm trying to do. For example the way I wrote store-entity' complects the function with the previous pipeline step (fetch-entity') because it behaves differently based on whether the previous pipeline step was successful or not.
Also having store-entity' be aware of an external Vector called error_ids does not feel right at all.
Is there an idiomatic way to handle these kinds of situations where you have multiple pipeline steps where some of them can throw exceptions (e.g. because they are I/O) where I can't easily use predicates to make sure the function will behave predictable and where I don't want to disturb the pipeline and only later check in which cases it went wrong?

It is possible to use a type of Try monad, for example from the cats library:
It represents a computation that may either result in an exception or return a successfully computed value. Is very similar to the Either monad, but is semantically different.It consists of two types: Success and Failure. The Success type is a simple wrapper, like Right of the Either monad. But the Failure type is slightly different from Left, because it always wraps an instance of Throwable (or any value in cljs since you can throw arbitrary values in the JavaScript host).(...)It is an analogue of the try-catch block: it replaces try-catch’s stack-based error handling with heap-based error handling. Instead of having an exception thrown and having to deal with it immediately in the same thread, it disconnects the error handling and recovery.
Heap-based error-handling is what you want.
Below I made an example of fetch-entity and store-entity. I made fetch-entity throw an ExceptionInfo on the first id (1) and store-entity throws a DivideByZeroException on the second id (0).
(ns your-project.core
(:require [cats.core :as cats]
[cats.monad.exception :as exc]))
(def ids [1 0 2]) ;; `fetch-entity` throws on 1, `store-entity` on 0, 2 works
(defn fetch-entity
"Throws an exception when the id is 1..."
[id]
(if (= id 1)
(throw (ex-info "id is 1, help!" {:id id}))
id))
(defn store-entity
"Unfortunately this function still needs to be aware that it receives a Try.
It throws a `DivideByZeroException` when the id is 0"
[id-try]
(if (exc/success? id-try) ; was the previous step a success?
(exc/try-on (/ 1 (exc/extract id-try))) ; if so: extract, apply fn, and rewrap
id-try)) ; else return original for later processing
(def results
(->> ids
(pmap #(exc/try-on (fetch-entity %)))
(pmap store-entity)))
Now you can filter results on successes or failures with respectively success? or failure? and retrieve the values via cats-extract
(def successful-results
(->> results
(filter exc/success?)
(mapv cats/extract)))
successful-results ;; => [1/2]
(def error-messages
(->> results
(filter exc/failure?)
(mapv cats/extract) ; gets exceptions without raising them
(mapv #(.getMessage %))))
error-messages ;; => ["id is 1, help!" "Divide by zero"]
Note that if you want to only loop over the errors or successful-results once you can use a transducer as follows:
(transduce (comp
(filter exc/success?)
(map cats/extract))
conj
results))
;; => [1/2]

My first thought is to combine fetch-entity and store-entity into a single operation:
(defn fetch-and-store [id]
(try
(store-entity (fetch-entity id))
(catch ... <log error msg> )))
(doall (pmap fetch-and-store ids))
Would something like this work?

Related

Streaming data to the caller in JVM

I have a function which gets data periodically and then stops getting the data. This function has to return the data that it is fetching periodically to the caller of the function either
As and when it gets
At one shot
The 2nd one is an easy implementation i.e you block the caller, fetch all the data and then send it in one shot.
But I want to implement the 1st one (I want to avoid having callbacks). Is streams the things to be used here? If so, how? If not, how do I return something on which the caller can query for data and also stop when it returns a signal that there is no more data?
Note: I am on the JVM ecosystem, clojure to be specific. I have had a look at the clojure library core.async which kind of solves this kind of a problem with the use of channels. But I was thinking if there is any other way which probably looks like this (assuming streams is something that can be used).
Java snippet
//Function which will periodically fetch MyData until there is no data
public Stream<MyData> myFunction() {
...
}
myFunction().filter(myData -> myData.text.equals("foo"))
Maybe you can just use seq - which is lazy by default (like Stream) so caller can decide when to pull the data in. And when there are no more data myFunction can simply end the sequence. While doing this, you would also encapsulate some optimisation within myFunction - e.g. to get data in batch to minimise roundtrips. Or fetch data periodically per your original requirement.
Here is one naive implementation:
(defn my-function []
(let [batch 100]
(->> (range)
(map #(let [from (* batch %)
to (+ from batch)]
(db-get from to)))
;; take while we have data from db-get
(take-while identity)
;; returns as one single seq/Stream
(apply concat))))
;; use it as a normal seq/Stream
(->> (my-function)
(filter odd?))
where db-get would be something like:
(defn db-get [from to]
;; return first 1000 records only, i.e. returns nil to signal completion
(when (< from 1000)
;; returns a range of records
(range from to)))
You might want to check https://github.com/ReactiveX/RxJava and https://github.com/ReactiveX/RxClojure (seems no longer maintained?)

How does the binding function work with core.async?

Is it ok to use binding with core.async? I'm using ClojureScript so core.async is very different.
(def ^:dynamic token "no-token")
(defn call
[path body]
(http-post (str host path) (merge {:headers {"X-token" token}} body)))) ; returns a core.async channel
(defn user-settings
[req]
(call "/api/user/settings" req))
; elsewhere after I've logged in
(let [token (async/<! (api/login {:user "me" :pass "pass"}))]
(binding
[token token]
(user-settings {:all-settings true})))
In ClojureScript1, binding is basically with-redefs plus an extra check that the Vars involved are marked :dynamic. On the other hand, gos get scheduled for execution1 in chunks (that is, they may be "parked" and later resumed, and interleaving between go blocks is arbitrary). These models don't mesh very well at all.
In short, no, please use explicitly-passed arguments instead.
1 The details are different in Clojure, but the conclusion remains the same.
2 Using the fastest mechanism possible, setTimeout with a time of 0 if nothing better is available.

Getting most recent response from a core.async

I am trying to validate a form using core.async by making a request to a validation function every time the form changes. The validation function is asynchronous itself. It hits an external service and returns either an array of error messages or an empty array.
(go-loop []
(when-let [value (<! field-chan)]
(go (let [errors (<! (validate value))]
(put! field-error-chan errors)))))
The above code is what i have at the moment. It works most of the time, but sometimes the response time from the server changes so the second request arrives before the first. If the value is not valid in the second case but valid the first time we would pull an array of errors followed by an empty array off the field-error-chan.
I could of course take the validation out of a go loop and have everything returning in the correct order, but, I would then end up taking values from the field-chan only after checking for errors. What I would like to do is validate values as they come but put the validation response on the errors channel in the order the value came not the order of the response.
Is this possible with core.async if not what would be my best approach to getting ordered responses?
Thanks
Assuming you can modify the external validation service, the simplest approach would probably be to attach timestamps (or simply counters) to validation requests and to have the validation service include them in their responses. Then you could always tell whether you're dealing with the response to the latest request.
Incidentally, the internal go form serves no purpose and could be merged into the outer go-loop. (Well, go forms return channels, but if the go-loop is actually meant to loop, this probably isn't important.)
You can write a switch function (inspired by RxJs):
(defn switch [in]
(let [out (chan)]
(go (loop [subchannel (<! in)]
(let [[v c] (alts! [subchannel in])]
(if (= c subchannel)
(do (>! out v) (recur subchannel))
(recur v)))))
out))
Then wrap the field-chan function and
(let [validate-last (switch (async/map validate [field-chan])]
...)
But note that the switch does not handle closing channels.

Reading Ring request body when already read

My question is, how can I idiomatically read the body of a Ring request if it has already been read?
Here's the background. I'm writing an error handler for a Ring app. When an error occurs, I want to log the error, including all relevant information that I might need to reproduce and fix the error. One important piece of information is the body of the request. However, the statefulness of the :body value (because it is a type of java.io.InputStream object) causes problems.
Specifically, what happens is that some middleware (the ring.middleware.json/wrap-json-body middleware in my case) does a slurp on the body InputStream object, which changes the internal state of the object such that future calls to slurp return an empty string. Thus, the [content of the] body is effectively lost from the request map.
The only solution I can think of is to preemptively copy the body InputStream object before the body can be read, just in case I might need it later. I don't like this approach because it seems clumsy to do some work on every request just in case there might be an error later. Is there a better approach?
I have a lib that sucks up the body, replaces it with a stream with identical contents, and stores the original so that it can be deflated later.
groundhog
This is not adequate for indefinitely open streams, and is a bad idea if the body is the upload of some large object. But it helps for testing, and recreating error conditions as a part of the debugging process.
If all you need is a duplicate of the stream, you can use the tee-stream function from groundhog as the basis for your own middleware.
I adopted #noisesmith's basic approach with a few modifications, as shown below. Each of these functions can be used as Ring middleware.
(defn with-request-copy
"Transparently store a copy of the request in the given atom.
Blocks until the entire body is read from the request. The request
stored in the atom (which is also the request passed to the handler)
will have a body that is a fresh (and resettable) ByteArrayInputStream
object."
[handler atom]
(fn [{orig-body :body :as request}]
(let [{body :stream} (groundhog/tee-stream orig-body)
request-copy (assoc request :body body)]
(reset! atom request-copy)
(handler request-copy))))
(defn wrap-error-page
"In the event of an exception, do something with the exception
(e.g. report it using an exception handling service) before
returning a blank 500 response. The `handle-exception` function
takes two arguments: the exception and the request (which has a
ready-to-slurp body)."
[handler handle-exception]
;; Note that, as a result of this top-level approach to
;; error-handling, the request map sent to Rollbar will lack any
;; information added to it by one of the middleware layers.
(let [request-copy (atom nil)
handler (with-request-copy handler request-copy)]
(fn [request]
(try
(handler request)
(catch Throwable e
(.reset (:body #request-copy))
;; You may also want to wrap this line in a try/catch block.
(handle-exception e #request-copy)
{:status 500})))))
I think you're stuck with some sort of "keep a copy around just in case" strategy. Unfortunately it looks like :body on the request must be an InputStream and nothing else (on the response it can be a String or other things which is why I mention it)
Sketch: In a very early middleware, wrap the :body InputStream in an InputStream that resets itself on close (example). Not all InputStreams can be reset, so you may need to do some copying here. Once wrapped, the stream can be re-read on close, and you're good. There's memory risk here if you have giant requests.
Update: here's an half-baked attempt, inspired in part by tee-stream in groundhog.
(require '[clojure.java.io :refer [copy]])
(defn wrap-resettable-body
[handler]
(fn [request]
(let [orig-body (:body request)
baos (java.io.ByteArrayOutputStream.)
_ (copy orig-body baos)
ba (.toByteArray baos)
bais (java.io.ByteArrayInputStream. ba)
;; bais doesn't need to be closed, and supports resetting, so wrap it
;; in a delegating proxy that calls its reset when closed.
resettable (proxy [java.io.InputStream] []
(available [] (.available bais))
(close [] (.reset bais))
(mark [read-limit] (.mark bais read-limit))
(markSupported [] (.markSupported bais))
;; exercise to reader: proxy with overloaded methods...
;; (read [] (.read bais))
(read [b off len] (.read bais b off len))
(reset [] (.reset bais))
(skip [n] (.skip bais)))
updated-req (assoc request :body resettable)]
(handler updated-req))))

Why can't I call seq functions in a sequence generated by js->clj?

Although I can get turn a simple js object into a clojure object with something like;
(-> "{a: 2, b: 3}" js* js->clj)
I'm apparently not being able to do so with a particular object, goog.events.BrowserEvent, in a handler function like:
(defn handle-click [e]
...
(-> e .-evt js->clj keys) ;; <-------------
...
The function does get applied, but the resulting object doesn't respond to sequence functions like countor first, although I can fetch items using aget. The error message I get, in chrome's console, is;
Uncaught Error: No protocol
method ISeqable.-seq defined for type object: [object Object]
Why is this happening? Shouldn't js->clj work with all objects?
How can I fix this?
Thanks!
The js->clj only changes something that is exactly a JavaScript object (it is implemented using instance? instead of isa?, and with good reasons), when you pass a descendant of js\Object js->clj returns the same object. aget (and aset) works because it compiles down to the object[field-name] syntax on JavaScript.
You can extend the ISeq protocol (or any other protocol) to the goog.events.BrowserEvent and all functions that works with ISeq will work with goog.events.BrowserEvent. There is a talk by Chris Houser where he showed how to extend a bunch of protocols to a goog Map. I recommend watching the whole talk, but the part that are relevant to your question begins at approximately 14 minutes.
First, I found out functions in google closure to get the keys and values of an object:
(defn goog-hash-map [object]
(zipmap (goog.object/getKeys object) (goog.object/getValues object)))
Then, by studying the source of cljs.core, I realized all I had to do was to extend the IEncodeClojure interface with it:
(extend-protocol IEncodeClojure
goog.events.BrowserEvent
(-js->clj
([x {:keys [keywordize-keys] :as options}]
(let [keyfn (if keywordize-keys keyword str)]
(zipmap (map keyfn (gobj/getKeys x)) (gobj/getValues x))))
([x] (-js->cljs x {:keywordize-keys false}))))
The original code doesn't work on this object, because its type must be exactly Object. I tried to change the comparison function to instance?, ie,
(instance? x js/Object) (into {} (for [k (js-keys x)]
[(keyfn k) (thisfn (aget x k))]))
but that didn't work either, wielding the following error, which made me settle for the previous approach.
Uncaught TypeError: Expecting a function in instanceof check,
but got function Object() { [native code] }`.