How much do Cascalog Traps thread into other functions? - clojure

I was wondering how far down cascalog traps boiled down using the following example.
(defn -main
"Good ole main boilerplate"
[people-path trap-path output-path]
(?- (hfs-textline output-path)
(people-query (hfs-textline trap-path)
(hfs-textline people-path))))
(defn people-query
"The people query!"
[trap-path people-path]
(<- [?first-name ?last-name ?daily-visits]
(:trap trap)
(some-generic-parsing-to-tuple-fn people-path :> ?fname ?lname)
(c/count ?daily-visits)))
If some-generic-parsing-to-tuple-fn is called within the scope of the people-query (which contains the (:trap trap) will the sub query also have access to that trap or should the some-generic-parsing-to-tuple-fn also contain the trap information i.e.
(some-generic-parsing-to-tuple-fn trap-path people-path :> ?fname ?lname)
TL;DR How much do I need to pass around traps in Cascalog queries in order to allow nested queries to have access to the trap

Related

Docstring for custom definitions

In the project I'm working on we often define custom defsomething-style macros for different purposes to hide boilerplate. One example is defhook which helps to define a hook handler for an event. Here's a simplified version of it (the actual version has more parameters and does some non-trivial things in defmethod, but that's irrelevant to my question):
(defmulti handle-hook
"This multimethod is called when an event was fired."
(fn [event context] event))
(defmacro defhook
"Define a hook for an event."
[event docstring & more]
`(let [body# (fn ~#more)]
(defmethod handle-hook ~event [event# context#]
(body# context#))))
(defhook "EntryDeleted"
"Hook called on entry deletion."
[context]
(log-deletion (:EntryID context)))
The main problem I have with this code is that defmethod does not support docstring, so I can't use the one for "EntryDeleted" in REPL or for automatic documentation generation. The last one is important for the project: there are defhooks and defhandlers that are exposed as external API and currently we have to maintain documentation separately (and manually).
So the simplest question is "how to attach docstring to a defmethod"?.
And the deeper one would be "how to attach/generate documentation for custom defsomething macros?"
If some of the existing tools for documentation generation supported this feature it would be great! Yet, neither of Marginalia, Codox or Autodoc seem to support something like that.
How to attach/generate documentation for custom defsomething macros?
Since docstrings are attached to vars, you'd generally have defsomething macros expand into a more primitive def form, e.g. defn, def. Then you just arrange for your defsomething's docstring to be attached to the underlying var.
How to attach docstring to a defmethod?
This is a special case - defmethod is not defining a new var; it's calling a Java method on a Java object. On the other hand, defmulti does create a var. One idea would be to extend the multi-function's docstring with the dispatch value and associated description. For example,
(defn append-hook-doc! [event docstring]
(let [hook-doc (str event " - " docstring)]
(alter-meta! #'handle-hook
(fn [m]
(update-in m [:doc] #(str % "\n\t" hook-doc))))))
...
(doc handle-hook)
-------------------------
user/handle-hook
This multimethod is called when an event was fired.
EntryDeleted - Hook called on entry deletion.
As the ! indicates, this form has a side-effect: multiple evaluations of a defining form that calls this will result in duplicate lines in #'handle-hook's docstring. You might avoid this by stashing some extra metadata in #'handle-hook to use as a marker for whether or not the doc has already been appended. Alternatively, you might stash the docstrings elsewhere and patch it all together in some auxiliary step, e.g. by delaying the expansion of (defmulti handle-hook ... until you have all the docstrings (although, this breaks the open extension of multi-methods wrt docstrings).

Clojure - connections at compile time

I have a rabbitMQ connection that seems to be started at compile time (when I type lein compile) and then blocks the building of my project. Here are more details on the problem. Let us say this is the clojure file bla_test.clj
(import (com.rabbitmq.client ConnectionFactory Connection Channel QueueingConsumer))
;; And then we have to translate the equivalent java hello world program using
;; Clojure's excellent interop.
;; It feels very strange writing this sort of ceremony-oriented imperative code
;; in Clojure:
;; Make a connection factory on the local host
(def connection-factory
(doto (ConnectionFactory.)
(.setHost "localhost")))
;; and get it to make you a connection
(def connection (.newConnection connection-factory))
;; get that to make you a channel
(def channel (. connection createChannel))
;;HERE I WOULD LIKE TO USE THE SAME CONNECTION AND THE SAME CHANNEL INSTANCE AS OFTEN AS
;; I LIKE
(dotimes [ i 10 ]
(. channel basicPublish "" "hello" nil (. (format "Hello World! (%d)" i) getBytes)))
The clojure file above is part of a bigger clojure program that I build using lein. My problem is that when I compile with "lein compile", a connection is done because of the line (def connection (.newConnection connection-factory)) and then the compilation is stopped! How can I avoid this? Is there a way to compile without building connection? How can I manage to use the same instance of channel over several calls coming from external components?
Any help would be appreciated.
Regards,
Horace
The Clojure compiler must evaluate all top-level forms, because it can be required to run arbitrary code when expanding calls to macros.
The usual solution to issues like the one you describe is to define a top-level Var holding an object of a dereferenceable type, for example an atom or a promise, and have an initialization function provide the value at runtime. (You could also use a delay and specify the value inline; this is less flexible, since it makes it more difficult to use a different value for testing etc.)

A possible solution to avoid limitation in go macro: Must be called inside a (go ...) block

Perhaps a possible solution to use (<! c) outside go macro could be done with macro and its macro expansion time :
This is my example:
(ns fourclojure.asynco
(require [clojure.core.async :as async :refer :all]))
(defmacro runtime--fn [the-fn the-value]
`(~the-fn ~the-value)
)
(defmacro call-fn [ the-fn]
`(runtime--fn ~the-fn (<! my-chan))
)
(def my-chan (chan))
(defn read-channel [the-fn]
(go
(loop []
(call-fn the-fn)
(recur)
)
))
(defn paint []
(put! my-chan "paint!")
)
And to test it:
(read-channel print)
(repeatedly 50 paint)
I've tried this solution in a nested go and also works. But I'm not sure if it could be a correct path
The reason about this question is releated to this other question Isn't core.async contrary to Clojure principles?, #aeuhuea comment that "It seems to me that this prevents simplicity and composability. Why is it not a problem?" and #cgrand response "The limitation of the go macro (its locality) is also a feature: it enforces source code locality of stateful operations."
But force to localize your code is not the same as "complect"?
Regarding the title of your question:
>!must be called in a go block because it's designed to. If you are interested in the go-block state-machine mechanics, I can highly recommend Timothy Baldridges Youtube videos on that http://www.youtube.com/channel/UCLxWPHbkxjR-G-y6CVoEHOw
Remember that there is always blocking take and put >!! and <!!. I don't know which part of your code is supposed to provide a "solution" for not being able to use <! and >! outside of a go block, however looping around events dispatched from a single channel is common practice. Here is a modified version of read-channel
(defn do-channel [f ch]
(go-loop []
(when-let [v (<! ch)]
(f v)
(recur))))
put! puts asynchronously, an effect that you usually don't intend. In your example, to put the string "paint" into the channel 50 times, I'd recommend a one-liner like this one:
(do-channel println (to-chan (repeat 50 "print")))
Here is a comment as an answer to your edit:
Channels are not designed to be used as mutable data-structures, period. They have a buffer and that buffer can be thought of as a mutable queue. However we don't use channels to store a value in there, just to take it out a few lines later again.
We use channels as helping construct that may be used to bring execution of two or more different pieces of source-code in two or more different places in line. E.g. a go-block here does not continue to execute until it has received a value produced by another go-block. >! and >!! help us to distinguish whether they are used in a thread-blocking context or in a go-block (blocking a spawned process).
Also, please refer to this answer: Clojure - Why does execution hang when doing blocking insert into channel? (core.async)
You should not use >!! or <!! inside of a go-block, neither transparently or nested in a function call. Rich Hickey himself has commented on that in a recent bug report (http://dev.clojure.org/jira/browse/ASYNC-29?focusedCommentId=32414&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-32414).
Looking at the source-code of >! you will see that it only throws an exception. As a matter of fact, go will replace >! with different source-code. go spawns a state-machine controlled process. Depending on the context you may want to make this explicitly known or nest the go block inside of a macro or function (like in the code examples that you have provided).
Regarding David Nolens (swannodettes) helpers: They have been implemented by Rich Hickey and Nolen himself into the core.async library. Nolen said himself that they are superseded in this presentation (http://www.youtube.com/watch?v=AhxcGGeh5ho). Notice that go-loop has been implemented since after Nolens commit.

How should carmine's wcar macro be used?

I'm confused by how calls with carmine should be done. I found the wcar macro described in carmine's docs:
(defmacro wcar [& body] `(car/with-conn pool spec-server1 ~#body))
Do I really have to call wcar every time I want to talk to redis in addition to the redis command? Or can I just call it once at the beginning? If so how?
This is what some code with tavisrudd's redis library looked like (from my toy url shortener project's testsuite):
(deftest test_shorten_doesnt_exist_create_new_next
(redis/with-server test-server
(redis/set "url_counter" 51)
(shorten test-url)
(is (= "1g" (redis/get (str "urls|" test-url))))
(is (= test-url (redis/get "shorts|1g")))))
And now I can only get it working with carmine by writing it like this:
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(is (= "1g" (wcar (car/get (str "urls|" test-url)))))
(is (= test-url (wcar (car/get "shorts|1g")))))
So what's the right way of using it and what underlying concept am I not getting?
Dan's explanation is correct.
Carmine uses response pipelining by default, whereas redis-clojure requires you to ask for pipelining when you want it (using the pipeline macro).
The main reason you'd want pipelining is for performance. Redis is so fast that the bottleneck in using it is often the time it takes for the request+response to travel over the network.
Clojure destructuring provides a convenient way of dealing with the pipelined response, but it does require writing your code differently to redis-clojure. The way I'd write your example is something like this (I'm assuming your shorten fn has side effects and needs to be called before the GETs):
(deftest test_shorten_doesnt_exist_create_new_next
(wcar (car/set "url_counter" 51))
(shorten test-url)
(let [[response1 response2] (wcar (car/get (str "urls|" test-url))
(car/get "shorts|1g"))]
(is (= "1g" response1))
(is (= test-url response2))))
So we're sending the first (SET) request to Redis and waiting for the reply (I'm not certain if that's actually necessary here). We then send the next two (GET) requests at once, allow Redis to queue the responses, then receive them all back at once as a vector that we'll destructure.
At first this may seem like unnecessary extra effort because it requires you to be explicit about when to receive queued responses, but it brings a lot of benefits including performance, clarity, and composable commands.
I'd check out Touchstone on GitHub if you're looking for an example of what I'd consider idiomatic Carmine use (just search for the wcar calls). (Sorry, SO is preventing me from including another link).
Otherwise just pop me an email (or file a GitHub issue) if you have any other questions.
Don't worry, you're using it the correct way already.
The Redis request functions (such as the get and set that you're using above) are all routed through another function send-request! that relies on a dynamically bound *context* to provide the connection. Attempting to call any of these Redis commands without that context will fail with a "no context" error. The with-conn macro (used in wcar) sets that context and provides the connection.
The wcar macro is then just a thin wrapper around with-conn making the assumption that you will be using the same connection details for all Redis requests.
So far this is all very similar to how Tavis Rudd's redis-clojure works.
So, the question now is why does Carmine need multiple wcar's when Redis-Clojure only required a single with-server?
And the answer is, it doesn't. Apart from sometimes, when it does. Carmine's with-conn uses Redis's "Pipelining" to send multiple requests with the same connection and then package the responses together in a vector. The example from the README shows this in action.
(wcar (car/ping)
(car/set "foo" "bar")
(car/get "foo"))
=> ["PONG" "OK" "bar"]
Here you will see that ping, set and get are only concerned with sending the request, leaving the receiving of response up to wcar. This precludes asserts (or any result access) from inside of wcar and leads to the separation of requests and multiple wcar calls that you have.

Clojure deftype calling function in the same namespace throws "java.lang.IllegalStateException: Attempting to call unbound fn:"

I'm placing Clojure into an existing Java project which heavily uses Jersey and Annotations. I'd like to be able to leverage the existing custom annotations, filters, etc of the previous work. So far I've been roughly using the deftype approach with javax.ws.rs annotations found in Chapter 9 of Clojure Programming.
(ns my.namespace.TestResource
(:use [clojure.data.json :only (json-str)])
(:import [javax.ws.rs DefaultValue QueryParam Path Produces GET]
[javax.ws.rs.core Response]))
;;My function that I'd like to call from the resource.
(defn get-response [to]
(.build
(Response/ok
(json-str {:hello to}))))
(definterface Test
(getTest [^String to]))
(deftype ^{Path "/test"} TestResource [] Test
(^{GET true
Produces ["application/json"]}
getTest
[this ^{DefaultValue "" QueryParam "to"} to]
;Drop out of "interop" code as soon as possible
(get-response to)))
As you can see from the comments, I'd like to call functions outside the deftype, but within the same namespace. At least in my mind, this allows me to have the deftype focus on interop and wiring up to Jersey, and the application logic to be separate (and more like the Clojure I want to write).
However when I do this I get the following exception:
java.lang.IllegalStateException: Attempting to call unbound fn: #'my.namespace.TestResource/get-response
Is there something unique about a deftype and namespaces?
... funny my hours on this problem did not yield an answer until after I asked here :)
It looks like namespace loading and deftypes was addressed in this post. As I suspected the deftype does not automatically load the namespace. As found in the post, I was able to fix this by adding a require like this:
(deftype ^{Path "/test"} TestResource [] Test
(^{GET true
Produces ["application/json"]}
getTest
[this ^{DefaultValue "" QueryParam "to"} to]
;Drop out of "interop" code as soon as possible
(require 'my.namespace.TestResource)
(get-response to)))