Lamina Batched Queue - clojure

I'm trying to write a web service that takes requests, puts them into a queue, and then processes them in batches of 2. The response can be sent straight away, and I'm trying to use Lamina as follows (though not sure it's the right choice)...
(def ch (channel))
(def loop-forever
(comp doall repeatedly))
(defn consumer []
(loop-forever
(fn []
(process-batch
#(read-channel ch)
#(read-channel ch)))))
(def handler [req]
(enqueue ch req)
{:status 200
:body "ok"})
But this doesn't work... :( I've been through all the Lamina docs but can't get my head around how to use these channels. Can anyone confirm if Lamina supports this kind of behaviour and advise on a possible solution?

The point of lamina is that you don't want to loop forever: you want lamina's scheduler to use a thread from its pool to do work for you whenever you have enough data to do work on. So instead of using the very, very low-level read-channel function, use receive to register a callback once, or (more often) receive-all to register a callback for every time a channel receives data. For example:
(def ch (lamina/channel))
(lamina/receive-all (lamina/partition* 2 channel)
(partial apply process-batch))
(defn handler [req]
(lamina/enqueue ch req)
{:status 200
:body "ok"})

Related

Streaming data to the caller in JVM

I have a function which gets data periodically and then stops getting the data. This function has to return the data that it is fetching periodically to the caller of the function either
As and when it gets
At one shot
The 2nd one is an easy implementation i.e you block the caller, fetch all the data and then send it in one shot.
But I want to implement the 1st one (I want to avoid having callbacks). Is streams the things to be used here? If so, how? If not, how do I return something on which the caller can query for data and also stop when it returns a signal that there is no more data?
Note: I am on the JVM ecosystem, clojure to be specific. I have had a look at the clojure library core.async which kind of solves this kind of a problem with the use of channels. But I was thinking if there is any other way which probably looks like this (assuming streams is something that can be used).
Java snippet
//Function which will periodically fetch MyData until there is no data
public Stream<MyData> myFunction() {
...
}
myFunction().filter(myData -> myData.text.equals("foo"))
Maybe you can just use seq - which is lazy by default (like Stream) so caller can decide when to pull the data in. And when there are no more data myFunction can simply end the sequence. While doing this, you would also encapsulate some optimisation within myFunction - e.g. to get data in batch to minimise roundtrips. Or fetch data periodically per your original requirement.
Here is one naive implementation:
(defn my-function []
(let [batch 100]
(->> (range)
(map #(let [from (* batch %)
to (+ from batch)]
(db-get from to)))
;; take while we have data from db-get
(take-while identity)
;; returns as one single seq/Stream
(apply concat))))
;; use it as a normal seq/Stream
(->> (my-function)
(filter odd?))
where db-get would be something like:
(defn db-get [from to]
;; return first 1000 records only, i.e. returns nil to signal completion
(when (< from 1000)
;; returns a range of records
(range from to)))
You might want to check https://github.com/ReactiveX/RxJava and https://github.com/ReactiveX/RxClojure (seems no longer maintained?)

Ring Middleware for the Client Side?

Usually Ring middleware is associated with the use on the server side.
In the post I'll discuss how the concept of ring middleware can be applied to http clients.
A very typical server side example might look like this:
(def server
(-> server-handler
wrap-transit-params
wrap-transit-response))
Desugared:
(def server (wrap-transit-response (wrap-transit-params handler)))
server is a function now, which accepts a request hash-map. Middleware can operate on this data before its send to the handler. It can also operate on the response hash-map that the handler returns. Or on both. It can even manipulate the execution of the handler.
Server
The above middleware could look like this in a very simplified way:
(1.) This operates on data before it gets to the actual handler (request, incoming data), parsing the body and providing the result as value to the :params key. It's called pre-wrap.
(defn wrap-transit-params [handler]
(fn [req]
(handler (assoc req :params (from-transit-str (req :body))))))
(2.) This one manipulates the outgoing data, the response of the server, outgoing data. - It's a post-wrap.
(defn wrap-tranist-response [handler]
(fn [req]
(let [resp (handler req)]
(update resp :body to-transit-str))))
With this a server can receive and respond data as transit.
Client
The same behavior could be desirable for an http-client.
(def client
(-> client-handler
wrap-transit-params
wrap-transit-response))
It turns out that the above middleware cannot easily be reused as client middleware, even though there is some symmetry.
For the client-side they should be implemented like this:
(defn wrap-transit-params [handler]
(fn [req]
(handler (assoc req :body (to-transit-str (req :params))))))
(defn wrap-transit-response [handler]
(fn [req]
(let [resp (handler req)]
(update resp :body from-transit-str))))
Now it could be used like this:
(client {:url "http://..."
:params {:one #{1 2 3}}})
Since in reality there would be much more things invloved, so I think having reusable middleware for both server and client side is utopian.
Though it remains up to discussion if the concept generally makes sense for clients. I could not find any client side middleware on the net.
The server side middleware is usually under the namespace ring.middleware... My concrete question here is if a library providing client side middleware should use this namespace or not.
Unless you believe that code that uses your new library could be written to be completely portable between client and server, I feel that using a different namespace will lead to less confusion. Find a fun name! :)

Getting most recent response from a core.async

I am trying to validate a form using core.async by making a request to a validation function every time the form changes. The validation function is asynchronous itself. It hits an external service and returns either an array of error messages or an empty array.
(go-loop []
(when-let [value (<! field-chan)]
(go (let [errors (<! (validate value))]
(put! field-error-chan errors)))))
The above code is what i have at the moment. It works most of the time, but sometimes the response time from the server changes so the second request arrives before the first. If the value is not valid in the second case but valid the first time we would pull an array of errors followed by an empty array off the field-error-chan.
I could of course take the validation out of a go loop and have everything returning in the correct order, but, I would then end up taking values from the field-chan only after checking for errors. What I would like to do is validate values as they come but put the validation response on the errors channel in the order the value came not the order of the response.
Is this possible with core.async if not what would be my best approach to getting ordered responses?
Thanks
Assuming you can modify the external validation service, the simplest approach would probably be to attach timestamps (or simply counters) to validation requests and to have the validation service include them in their responses. Then you could always tell whether you're dealing with the response to the latest request.
Incidentally, the internal go form serves no purpose and could be merged into the outer go-loop. (Well, go forms return channels, but if the go-loop is actually meant to loop, this probably isn't important.)
You can write a switch function (inspired by RxJs):
(defn switch [in]
(let [out (chan)]
(go (loop [subchannel (<! in)]
(let [[v c] (alts! [subchannel in])]
(if (= c subchannel)
(do (>! out v) (recur subchannel))
(recur v)))))
out))
Then wrap the field-chan function and
(let [validate-last (switch (async/map validate [field-chan])]
...)
But note that the switch does not handle closing channels.

Reading Ring request body when already read

My question is, how can I idiomatically read the body of a Ring request if it has already been read?
Here's the background. I'm writing an error handler for a Ring app. When an error occurs, I want to log the error, including all relevant information that I might need to reproduce and fix the error. One important piece of information is the body of the request. However, the statefulness of the :body value (because it is a type of java.io.InputStream object) causes problems.
Specifically, what happens is that some middleware (the ring.middleware.json/wrap-json-body middleware in my case) does a slurp on the body InputStream object, which changes the internal state of the object such that future calls to slurp return an empty string. Thus, the [content of the] body is effectively lost from the request map.
The only solution I can think of is to preemptively copy the body InputStream object before the body can be read, just in case I might need it later. I don't like this approach because it seems clumsy to do some work on every request just in case there might be an error later. Is there a better approach?
I have a lib that sucks up the body, replaces it with a stream with identical contents, and stores the original so that it can be deflated later.
groundhog
This is not adequate for indefinitely open streams, and is a bad idea if the body is the upload of some large object. But it helps for testing, and recreating error conditions as a part of the debugging process.
If all you need is a duplicate of the stream, you can use the tee-stream function from groundhog as the basis for your own middleware.
I adopted #noisesmith's basic approach with a few modifications, as shown below. Each of these functions can be used as Ring middleware.
(defn with-request-copy
"Transparently store a copy of the request in the given atom.
Blocks until the entire body is read from the request. The request
stored in the atom (which is also the request passed to the handler)
will have a body that is a fresh (and resettable) ByteArrayInputStream
object."
[handler atom]
(fn [{orig-body :body :as request}]
(let [{body :stream} (groundhog/tee-stream orig-body)
request-copy (assoc request :body body)]
(reset! atom request-copy)
(handler request-copy))))
(defn wrap-error-page
"In the event of an exception, do something with the exception
(e.g. report it using an exception handling service) before
returning a blank 500 response. The `handle-exception` function
takes two arguments: the exception and the request (which has a
ready-to-slurp body)."
[handler handle-exception]
;; Note that, as a result of this top-level approach to
;; error-handling, the request map sent to Rollbar will lack any
;; information added to it by one of the middleware layers.
(let [request-copy (atom nil)
handler (with-request-copy handler request-copy)]
(fn [request]
(try
(handler request)
(catch Throwable e
(.reset (:body #request-copy))
;; You may also want to wrap this line in a try/catch block.
(handle-exception e #request-copy)
{:status 500})))))
I think you're stuck with some sort of "keep a copy around just in case" strategy. Unfortunately it looks like :body on the request must be an InputStream and nothing else (on the response it can be a String or other things which is why I mention it)
Sketch: In a very early middleware, wrap the :body InputStream in an InputStream that resets itself on close (example). Not all InputStreams can be reset, so you may need to do some copying here. Once wrapped, the stream can be re-read on close, and you're good. There's memory risk here if you have giant requests.
Update: here's an half-baked attempt, inspired in part by tee-stream in groundhog.
(require '[clojure.java.io :refer [copy]])
(defn wrap-resettable-body
[handler]
(fn [request]
(let [orig-body (:body request)
baos (java.io.ByteArrayOutputStream.)
_ (copy orig-body baos)
ba (.toByteArray baos)
bais (java.io.ByteArrayInputStream. ba)
;; bais doesn't need to be closed, and supports resetting, so wrap it
;; in a delegating proxy that calls its reset when closed.
resettable (proxy [java.io.InputStream] []
(available [] (.available bais))
(close [] (.reset bais))
(mark [read-limit] (.mark bais read-limit))
(markSupported [] (.markSupported bais))
;; exercise to reader: proxy with overloaded methods...
;; (read [] (.read bais))
(read [b off len] (.read bais b off len))
(reset [] (.reset bais))
(skip [n] (.skip bais)))
updated-req (assoc request :body resettable)]
(handler updated-req))))

Clojure (script): macros to reason about async operations synchronously

Context
I'm playing with ClojureScript, so Ajax works as follows for me:
(make-ajax-call url data handler);
where handler looks something like:
(fn [response] .... )
Now, this means when I want to say something like "fetch the new data, and update the left sidebar", my end ends up looking like:
(make-ajax-call "/fetch-new-data" {} update-sidebar!) [1]
Now, I'd prefer to write this as:
(update-sidebar! (make-ajax-call "/fetch-new-data" {})) [2]
but it won't work because make-ajax call returns immediately.
Question
Is there some way via monads, or macros, to make this work? So that [2] gets auto rewritten into [1] ? I believe:
there will be no performance penality, since it's rewritten into [1[
it's clearer for me to reason about since I can think in synchronous steps rather than async events
I suspect I'm not the first to run into this problem, so if this is a well known problem, answers of the form "Google for Problem Foo" is perfectly valid.
Thanks!
Since Jun 28, 2013, the time that is released clojure core.async lib, you can do it, more or less, in this way: https://gist.github.com/juanantonioruz/7039755
Here the code pasted:
(ns fourclojure.stack
(require [clojure.core.async :as async :refer :all]))
(defn update-sidebar! [new-data]
(println "you have updated the sidebar with this data:" new-data))
(defn async-handler [the-channel data-recieved]
(put! the-channel data-recieved)
)
(defn make-ajax-call [url data-to-send]
(let [the-channel (chan)]
(go
(<! (timeout 2000)); wait 2 seconds to response
(async-handler the-channel (str "return value with this url: " url)))
the-channel
)
)
(update-sidebar! (<!! (make-ajax-call "/fetch-new-data" {})))
More info in:
* http://clojure.com/blog/2013/06/28/clojure-core-async-channels.html
* https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
a macro would change the appearance of the code while leaving the Ajax call asynchronous.
it's a simple template macro. another approach would be to wrap the call to make-ajax-call in a function that waits for the result. while either of these could be made to work they may seem a bit awkward and "un ajax like". will the benefits be worth the extra layer of abstraction?
What about using the threading macro? Isn't good enough?
(->> update-sidebar! (make-ajax-call "/fetch-new-data" {}))
We had rough ideas about this in the async branch of seesaw. See in particular the seesaw.async namespace.