Reading Ring request body when already read

Reading Ring request body when already read - clojure

My question is, how can I idiomatically read the body of a Ring request if it has already been read?
Here's the background. I'm writing an error handler for a Ring app. When an error occurs, I want to log the error, including all relevant information that I might need to reproduce and fix the error. One important piece of information is the body of the request. However, the statefulness of the :body value (because it is a type of java.io.InputStream object) causes problems.
Specifically, what happens is that some middleware (the ring.middleware.json/wrap-json-body middleware in my case) does a slurp on the body InputStream object, which changes the internal state of the object such that future calls to slurp return an empty string. Thus, the [content of the] body is effectively lost from the request map.
The only solution I can think of is to preemptively copy the body InputStream object before the body can be read, just in case I might need it later. I don't like this approach because it seems clumsy to do some work on every request just in case there might be an error later. Is there a better approach?

I have a lib that sucks up the body, replaces it with a stream with identical contents, and stores the original so that it can be deflated later.
groundhog
This is not adequate for indefinitely open streams, and is a bad idea if the body is the upload of some large object. But it helps for testing, and recreating error conditions as a part of the debugging process.
If all you need is a duplicate of the stream, you can use the tee-stream function from groundhog as the basis for your own middleware.

I adopted #noisesmith's basic approach with a few modifications, as shown below. Each of these functions can be used as Ring middleware.
(defn with-request-copy
"Transparently store a copy of the request in the given atom.
Blocks until the entire body is read from the request. The request
stored in the atom (which is also the request passed to the handler)
will have a body that is a fresh (and resettable) ByteArrayInputStream
object."
[handler atom]
(fn [{orig-body :body :as request}]
(let [{body :stream} (groundhog/tee-stream orig-body)
request-copy (assoc request :body body)]
(reset! atom request-copy)
(handler request-copy))))
(defn wrap-error-page
"In the event of an exception, do something with the exception
(e.g. report it using an exception handling service) before
returning a blank 500 response. The `handle-exception` function
takes two arguments: the exception and the request (which has a
ready-to-slurp body)."
[handler handle-exception]
;; Note that, as a result of this top-level approach to
;; error-handling, the request map sent to Rollbar will lack any
;; information added to it by one of the middleware layers.
(let [request-copy (atom nil)
handler (with-request-copy handler request-copy)]
(fn [request]
(try
(handler request)
(catch Throwable e
(.reset (:body #request-copy))
;; You may also want to wrap this line in a try/catch block.
(handle-exception e #request-copy)
{:status 500})))))

I think you're stuck with some sort of "keep a copy around just in case" strategy. Unfortunately it looks like :body on the request must be an InputStream and nothing else (on the response it can be a String or other things which is why I mention it)
Sketch: In a very early middleware, wrap the :body InputStream in an InputStream that resets itself on close (example). Not all InputStreams can be reset, so you may need to do some copying here. Once wrapped, the stream can be re-read on close, and you're good. There's memory risk here if you have giant requests.
Update: here's an half-baked attempt, inspired in part by tee-stream in groundhog.
(require '[clojure.java.io :refer [copy]])
(defn wrap-resettable-body
[handler]
(fn [request]
(let [orig-body (:body request)
baos (java.io.ByteArrayOutputStream.)
_ (copy orig-body baos)
ba (.toByteArray baos)
bais (java.io.ByteArrayInputStream. ba)
;; bais doesn't need to be closed, and supports resetting, so wrap it
;; in a delegating proxy that calls its reset when closed.
resettable (proxy [java.io.InputStream] []
(available [] (.available bais))
(close [] (.reset bais))
(mark [read-limit] (.mark bais read-limit))
(markSupported [] (.markSupported bais))
;; exercise to reader: proxy with overloaded methods...
;; (read [] (.read bais))
(read [b off len] (.read bais b off len))
(reset [] (.reset bais))
(skip [n] (.skip bais)))
updated-req (assoc request :body resettable)]
(handler updated-req))))

Related

How to I get the body text of a Response object returned by the fetch API in ClojureScript?

I'm trying to use the Github Gist API to get a list of all of my Gists like so:
(ns epi.core)
(.then (.fetch js/window "https://api.github.com/users/seisvelas/gists")
(fn [data] (.log js/epi data)))
js/epi is just console.log except provided by the blogging platform I'm using (epiphany.pub).
When I call that API from curl it works fine; however, when done in cljs instead of giving me the body of the response, this gives me [object Response]. Does anyone know how I can get the body text of the response?

TL;DR
(-> (.fetch js/window "https://api.github.com/users/seisvelas/gists")
(.then #(.json %)) ; Get JSON from the Response.body ReadableStream
(.then #(.log js/epi %))
is what I'd write
From ClojureScript, a JavaScript call like data.body() can be invoked with
(.body data)
and a JavaScript property access like data.body with
(.-body data)
One of those should work in your case. However, the fetch API requires a bit more if you want to get JSON from the body, which I assume you do based on the endpoint.
If you're dealing with promise chains, you might also want to consider using -> (thread-first) so it reads top to bottom.
See this Gist for more about threading promise chains.

There is a library wrapping js fetch API called lamdaisland.fetch. This library uses transit as default encoding format, so you need to specify accept format when working with github API.
This library contains kitchen-async.promise as its dependency, so you can require the kitchen-async.promise in your ClojureScript source code.
(ns fetch.demo.core
(:require [kitchen-async.promise :as p]
[lambdaisland.fetch :as fetch]))
(p/try
(p/let [resp (fetch/get
"https://api.github.com/users/seisvelas/gists"
{:accept :json
:content-type :json})]
(prn (:body resp)))
(p/catch :default e
;; log your exception here
(prn :error e)))

Seems like .fetch returns a Response object, and you need to get the attribute body from it for the body. https://developer.mozilla.org/en-US/docs/Web/API/Response
Something like (.body data)

Idiomatic error/exception handling with threading macros

I'm fetching thousands of entities from an API one at a time using http requests. As next step in the pipeline I want to shovel all of them into a database.
(->> ids
(pmap fetch-entity)
(pmap store-entity)
(doall))
fetch-entity expects a String id and tries to retrieve an entity using an http request and either returns a Map or throws an exception (e.g. because of a timeout).
store-entity expects a Map and tries to store it in a database. It possibly throws an exception (e.g. if the Map doesn't match the database schema or if it didn't receive a Map at all).
Inelegant Error Handling
My first "solution" was to write wrapper functions fetch-entity' and store-entity' to catch exceptions of their respective original functions.
fetch-entity' returns its input on failure, basically passing along a String id if the http request failed. This ensures that the whole pipeline keeps on trucking.
store-entity' checks the type of its argument. If the argument is a Map (fetch entity was successful and returned a Map) it attempts to store it in the database.
If the attempt of storing to the database throws an exception or if store-entity' got passed a String (id) instead of a Map it will conj to an external Vector of error_ids.
This way I can later use error_ids to figure out how often there was a failure and which ids were affected.
It doesn't feel like the above is a sensible way to achieve what I'm trying to do. For example the way I wrote store-entity' complects the function with the previous pipeline step (fetch-entity') because it behaves differently based on whether the previous pipeline step was successful or not.
Also having store-entity' be aware of an external Vector called error_ids does not feel right at all.
Is there an idiomatic way to handle these kinds of situations where you have multiple pipeline steps where some of them can throw exceptions (e.g. because they are I/O) where I can't easily use predicates to make sure the function will behave predictable and where I don't want to disturb the pipeline and only later check in which cases it went wrong?

It is possible to use a type of Try monad, for example from the cats library:
It represents a computation that may either result in an exception or return a successfully computed value. Is very similar to the Either monad, but is semantically different.It consists of two types: Success and Failure. The Success type is a simple wrapper, like Right of the Either monad. But the Failure type is slightly different from Left, because it always wraps an instance of Throwable (or any value in cljs since you can throw arbitrary values in the JavaScript host).(...)It is an analogue of the try-catch block: it replaces try-catch’s stack-based error handling with heap-based error handling. Instead of having an exception thrown and having to deal with it immediately in the same thread, it disconnects the error handling and recovery.
Heap-based error-handling is what you want.
Below I made an example of fetch-entity and store-entity. I made fetch-entity throw an ExceptionInfo on the first id (1) and store-entity throws a DivideByZeroException on the second id (0).
(ns your-project.core
(:require [cats.core :as cats]
[cats.monad.exception :as exc]))
(def ids [1 0 2]) ;; `fetch-entity` throws on 1, `store-entity` on 0, 2 works
(defn fetch-entity
"Throws an exception when the id is 1..."
[id]
(if (= id 1)
(throw (ex-info "id is 1, help!" {:id id}))
id))
(defn store-entity
"Unfortunately this function still needs to be aware that it receives a Try.
It throws a `DivideByZeroException` when the id is 0"
[id-try]
(if (exc/success? id-try) ; was the previous step a success?
(exc/try-on (/ 1 (exc/extract id-try))) ; if so: extract, apply fn, and rewrap
id-try)) ; else return original for later processing
(def results
(->> ids
(pmap #(exc/try-on (fetch-entity %)))
(pmap store-entity)))
Now you can filter results on successes or failures with respectively success? or failure? and retrieve the values via cats-extract
(def successful-results
(->> results
(filter exc/success?)
(mapv cats/extract)))
successful-results ;; => [1/2]
(def error-messages
(->> results
(filter exc/failure?)
(mapv cats/extract) ; gets exceptions without raising them
(mapv #(.getMessage %))))
error-messages ;; => ["id is 1, help!" "Divide by zero"]
Note that if you want to only loop over the errors or successful-results once you can use a transducer as follows:
(transduce (comp
(filter exc/success?)
(map cats/extract))
conj
results))
;; => [1/2]

My first thought is to combine fetch-entity and store-entity into a single operation:
(defn fetch-and-store [id]
(try
(store-entity (fetch-entity id))
(catch ... <log error msg> )))
(doall (pmap fetch-and-store ids))
Would something like this work?

Ring Middleware for the Client Side?

Usually Ring middleware is associated with the use on the server side.
In the post I'll discuss how the concept of ring middleware can be applied to http clients.
A very typical server side example might look like this:
(def server
(-> server-handler
wrap-transit-params
wrap-transit-response))
Desugared:
(def server (wrap-transit-response (wrap-transit-params handler)))
server is a function now, which accepts a request hash-map. Middleware can operate on this data before its send to the handler. It can also operate on the response hash-map that the handler returns. Or on both. It can even manipulate the execution of the handler.
Server
The above middleware could look like this in a very simplified way:
(1.) This operates on data before it gets to the actual handler (request, incoming data), parsing the body and providing the result as value to the :params key. It's called pre-wrap.
(defn wrap-transit-params [handler]
(fn [req]
(handler (assoc req :params (from-transit-str (req :body))))))
(2.) This one manipulates the outgoing data, the response of the server, outgoing data. - It's a post-wrap.
(defn wrap-tranist-response [handler]
(fn [req]
(let [resp (handler req)]
(update resp :body to-transit-str))))
With this a server can receive and respond data as transit.
Client
The same behavior could be desirable for an http-client.
(def client
(-> client-handler
wrap-transit-params
wrap-transit-response))
It turns out that the above middleware cannot easily be reused as client middleware, even though there is some symmetry.
For the client-side they should be implemented like this:
(defn wrap-transit-params [handler]
(fn [req]
(handler (assoc req :body (to-transit-str (req :params))))))
(defn wrap-transit-response [handler]
(fn [req]
(let [resp (handler req)]
(update resp :body from-transit-str))))
Now it could be used like this:
(client {:url "http://..."
:params {:one #{1 2 3}}})
Since in reality there would be much more things invloved, so I think having reusable middleware for both server and client side is utopian.
Though it remains up to discussion if the concept generally makes sense for clients. I could not find any client side middleware on the net.
The server side middleware is usually under the namespace ring.middleware... My concrete question here is if a library providing client side middleware should use this namespace or not.

Unless you believe that code that uses your new library could be written to be completely portable between client and server, I feel that using a different namespace will lead to less confusion. Find a fun name! :)

Getting most recent response from a core.async

I am trying to validate a form using core.async by making a request to a validation function every time the form changes. The validation function is asynchronous itself. It hits an external service and returns either an array of error messages or an empty array.
(go-loop []
(when-let [value (<! field-chan)]
(go (let [errors (<! (validate value))]
(put! field-error-chan errors)))))
The above code is what i have at the moment. It works most of the time, but sometimes the response time from the server changes so the second request arrives before the first. If the value is not valid in the second case but valid the first time we would pull an array of errors followed by an empty array off the field-error-chan.
I could of course take the validation out of a go loop and have everything returning in the correct order, but, I would then end up taking values from the field-chan only after checking for errors. What I would like to do is validate values as they come but put the validation response on the errors channel in the order the value came not the order of the response.
Is this possible with core.async if not what would be my best approach to getting ordered responses?
Thanks

Assuming you can modify the external validation service, the simplest approach would probably be to attach timestamps (or simply counters) to validation requests and to have the validation service include them in their responses. Then you could always tell whether you're dealing with the response to the latest request.
Incidentally, the internal go form serves no purpose and could be merged into the outer go-loop. (Well, go forms return channels, but if the go-loop is actually meant to loop, this probably isn't important.)

You can write a switch function (inspired by RxJs):
(defn switch [in]
(let [out (chan)]
(go (loop [subchannel (<! in)]
(let [[v c] (alts! [subchannel in])]
(if (= c subchannel)
(do (>! out v) (recur subchannel))
(recur v)))))
out))
Then wrap the field-chan function and
(let [validate-last (switch (async/map validate [field-chan])]
...)
But note that the switch does not handle closing channels.

Lamina Batched Queue

I'm trying to write a web service that takes requests, puts them into a queue, and then processes them in batches of 2. The response can be sent straight away, and I'm trying to use Lamina as follows (though not sure it's the right choice)...
(def ch (channel))
(def loop-forever
(comp doall repeatedly))
(defn consumer []
(loop-forever
(fn []
(process-batch
#(read-channel ch)
#(read-channel ch)))))
(def handler [req]
(enqueue ch req)
{:status 200
:body "ok"})
But this doesn't work... :( I've been through all the Lamina docs but can't get my head around how to use these channels. Can anyone confirm if Lamina supports this kind of behaviour and advise on a possible solution?

The point of lamina is that you don't want to loop forever: you want lamina's scheduler to use a thread from its pool to do work for you whenever you have enough data to do work on. So instead of using the very, very low-level read-channel function, use receive to register a callback once, or (more often) receive-all to register a callback for every time a channel receives data. For example:
(def ch (lamina/channel))
(lamina/receive-all (lamina/partition* 2 channel)
(partial apply process-batch))
(defn handler [req]
(lamina/enqueue ch req)
{:status 200
:body "ok"})

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js