Clojure Enlive: How to simplify this extract fn from scrape3.clj - clojure

I followed the enlive-tutorial and must say i'm impressed by the power of Enlive for parsing the web. Now, I came to look further to the scrape3.clj available here: https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape3.clj
Swannodette has made a great job in designing this example, but I feel we could make it a bit dryer.
My question: I would you rewrite this extract function to make it dryer:
(defn extract [node]
(let [headline (first (html/select [node] *headline-selector*))
byline (first (html/select [node] *byline-selector*))
summary (first (html/select [node] *summary-selector*))
result (map html/text [headline byline summary])]
(zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result))))
If you have other ideas on other elements of the program, feel free to share them!
EDIT:
I played around and came up with:
(defn extract [node]
(let [s [*headline-selector* *byline-selector* *summary-selector*]
selected (map #(html/text (first (html/select [node] %))) s)
cleaned (map #(re-gsub #"\n" "" %) selected)]
(zipmap [:headline :byline :summary] cleaned)))

the first (html/select [node] can be hoisted to a local function:
(defn extract [node]
(let [selector (fn [sel]) (html/select [node] sel)
headline (selector *headline-selector*)
byline (selector *byline-selector*)
summary (selector *summary-selector*)
result (map html/text [headline byline summary])]
(zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result))))
then the intermediary names can be removed, though these help make the point of the code clear so it's a matter of personal taste:
(defn extract [node]
(let [selector (fn [selector]) (html/select [node] selector)
result (map html/text
(map selector [*headline-selector*
*byline-selector*
*summary-selector*]))]
(zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result))))

To make the result of the function "more visible" I would use map literal as shown below:
(defn extract [node]
(let [sel #(html/text (first (html/select [node] %)))
rem #(re-gsub #"\n" "" %)
get-text #(-> % sel rem)]
{:headline (get-text *headline-selector*)
:byline (get-text *byline-selector*)
:summary (get-text *summary-selector*)
}))

Related

Creating a Ruby-like `binding.pry` macro in Clojure

I've been struggling to create a macro that'll allow me to dynamically bind whatever is in the &env into a binding form and then delegate to a pry like function to open a REPL that can see those bound &env symbols.
My simplistic pry func, which works as expected
(defn pry []
(print (str "pry(" *ns* ")> "))
(flush)
(let [expr (read)]
(when-not (= expr :exit)
(println (eval expr))
(recur))))
Using the pry func:
clojure-noob.core=> (def a 1)
#'clojure-noob.core/a
clojure-noob.core=> (pry)
pry(clojure-noob.core)> (+ a 1)
2
pry(clojure-noob.core)> :exit
nil
clojure-noob.core=>
My attempt at creating a dynamic invocation of binding:
(defmacro binding-pry []
(let [ks (keys &env)]
`(let [ks# '~ks
vs# [~#ks]
bs# (vec (interleave ks# vs#))]
(binding bs# (pry)))))
However, this fails because the inner symbol bs# is not expanded to an actual vector but instead is the generated symbol and binding tosses a clojure.core/binding requires a vector for its binding exception.
clojure-noob.core=> (let [a 1 b 2] (binding-pry))
Syntax error macroexpanding clojure.core/binding at (/tmp/form-init14332359378145135257.clj:1:16).
clojure.core/binding requires a vector for its binding in clojure-noob.core:
clojure-noob.core=>
The code quoted form with a debug print, the bs# symbol is resolved when printing but I don't know how to make it resolve to a vector when constructing the binding form.
(defmacro binding-pry []
(let [ks (keys &env)]
`(let [ks# '~ks
vs# [~#ks]
bs# (vec (interleave ks# vs#))]
(println bs#)
`(binding bs# (pry)))))
clojure-noob.core=> (let [a 1 b 2] (binding-pry))
[a 1 b 2]
(clojure.core/binding clojure-noob.core/bs__2464__auto__ (clojure-noob.core/pry))
clojure-noob.core=>
I'm very confident I'm tackling this incorrectly but I don't see another approach.
The Joy of Clojure demonstrates a break macro that does this already. I can't reproduce its source here, because it's EPL and not CC. But you can see its source at https://github.com/joyofclojure/book-source/blob/b76ef15/first-edition/src/joy/breakpoint.clj. It refers to a contextual-eval function as well: https://github.com/joyofclojure/book-source/blob/b76ef15/first-edition/src/joy/macros.clj#L4-L7.
A first step towards improving your attempt could be writing:
(defmacro binding-pry []
(let [ks (keys &env)]
`(binding [~#(interleave ks ks)] (pry))))
This still doesn't work because binding expects that the symbols can be resolved to existing dynamic vars. To tackle this problem you could make binding-pry introduce such vars as shown below:
(defmacro binding-pry []
(let [ks (keys &env)]
`(do
~#(map (fn [k] `(def ~(with-meta k {:dynamic true}))) ks)
(binding [~#(interleave ks ks)] (pry)))))
But this can have undesirable side-effects, like polluting the namespace with new var-names or making existing vars dynamic. So I would prefer an approach like the one mentioned in amalloy's answer but with a better implementation of eval-in-context (see my comment there).
To write a self-contained answer based on your pry function, let's first define eval-in which evaluates a form in an environment:
(defn eval-in [env form]
(apply
(eval `(fn* [~#(keys env)] ~form))
(vals env)))
Then let's modify pry to take an environment as an argument and use eval-in instead of eval:
(defn pry [env]
(let [prompt (str "pry(" *ns* ")> ")]
(loop []
(print prompt)
(flush)
(let [expr (read)]
(when-not (= expr :exit)
(println (eval-in env expr))
(recur))))))
An equivalent, less primitive version could be:
(defn pry [env]
(->> (repeatedly (let [prompt (str "pry(" *ns* ")> ")]
#(do (print prompt) (flush) (read))))
(take-while (partial not= :exit))
(run! (comp println (partial eval-in env)))))
Now we can define binding-pry as follows:
(defmacro binding-pry []
`(pry ~(into {}
(map (juxt (partial list 'quote) identity))
(keys &env))))
Finally, here is a direct/"spaghetti" implementation of binding-pry:
(defmacro binding-pry []
(let [ks (keys &env)]
`(->> (repeatedly (let* [prompt# (str "pry(" *ns* ")> ")]
#(do (print prompt#) (flush) (read))))
(take-while (partial not= :exit))
(run! (comp println
#((eval `(fn* [~~#(map (partial list 'quote) ks)] ~%))
~#ks))))))

Improving complex data structure replacement

I'm attempting to modify a specific field in a data structure, described below (a filled example can be found here:
[{:fields "There are a few other fields here"
:incidents [{:fields "There are a few other fields here"
:updates [{:fields "There are a few other fields here"
:content "THIS is the field I want to replace"
:translations [{:based_on "Based on the VALUE of this"
:content "Replace with this value"}]}]}]}]
I already have this implemented it in a number of functions, as below:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(vec (map translate-update updates)))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(vec (map translate-incident incidents)))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(vec (map translate-service services)))
Each array could have any number of entries (though the number is likely less than 10).
The basic premise is to replace the :content in each :update with the relevant :translation based on a provided value.
My Clojure knowledge is limited, so I'm curious if there is a more optimal way to achieve this?
EDIT
Solution so far:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(mapv translate-update updates))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(mapv translate-incident incidents))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(mapv translate-service services))
I would start more or less as you do, bottom-up, by defining some functions that look like they will be useful: how to choose a translation from among a list of translations, and how to apply that choice to an update. But I wouldn't make the functions so tiny as yours: the logic is all spread out into a lot of places, and it's not easy to get an overall idea of what is going on. Here are the two functions I'd start with:
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
...)
Of course you can defn them instead if you'd like, and I suspect many people would, but they're only going to be used in one place, and they're intimately involved with the context in which they're used, so I like a letfn. These two functions are really all the interesting logic; the rest is just some boring tree-traversal code to apply this logic in the right places.
Now to build out the body of the letfn is straightforward, and easy to read if you make your code be the same shape as the data it manipulates. We want to walk through a series of nested lists, updating objects on the way, and so we just write a series of nested for comprehensions, calling update to descend into the right keyspace:
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))
I think using for here is miles better than using map, although of course they are equivalent as always. The important difference is that as you read through the code you see the new context first ("okay, now we're doing something to each user"), and then what is happening inside that context; with map you see them in the other order and it is difficult to keep tack of what is happening where.
Combining these, and putting them into a defn, we get a function that you can call with your example input and which produces your desired output (assuming a suitable definition of get-locale):
(defn translate [users]
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))))
we can try to find some patterns in this task (based on the contents of the snippet from github gist, you've posted):
simply speaking, you need to
1) update every item (A) in vector of data
2) updating every item (B) in vector of A's :incidents
3) updating every item (C) in vector of B's :updates
4) translating C
The translate function could look like this:
(defn translate [{translations :translations :as item} locale]
(assoc item :content
(or (some #(when (= (:locale %) locale) (:content %)) translations)
:no-translation-found)))
it's usage (some fields are omitted for brevity):
user> (translate {:id 1
:content "abc"
:severity "101"
:translations [{:locale "fr_FR"
:content "abc"}
{:locale "ru_RU"
:content "абв"}]}
"ru_RU")
;;=> {:id 1,
;; :content "абв",
;; :severity "101",
;; :translations [{:locale "fr_FR", :content "abc"} {:locale "ru_RU", :content "абв"}]}
then we can see that 1 and 2 are totally similar, so we can generalize that:
(defn update-vec-of-maps [data k f]
(mapv (fn [item] (update item k f)) data))
using it as a building block you can make up the whole data transformation:
(defn transform [data locale]
(update-vec-of-maps
data :incidents
(fn [incidents]
(update-vec-of-maps
incidents :updates
(fn [updates] (mapv #(translate % locale) updates))))))
(transform data "it_IT")
returns what you need.
then you can generalize it further, making the utility function for arbitrary depth transformations:
(defn deep-update-vec-of-maps [data ks terminal-fn]
(if (seq ks)
((reduce (fn [f k] #(update-vec-of-maps % k f))
terminal-fn (reverse ks))
data)
data))
and use it like this:
(deep-update-vec-of-maps data [:incidents :updates]
(fn [updates]
(mapv #(translate % "it_IT") updates)))
I recommend you look at https://github.com/nathanmarz/specter
It makes it really easy to read and update clojure data structures. Same performance as hand-written code, but much shorter.

"->>" macro and iterative function application

I'm working through a book on clojure and ran into a stumbling block with "->>". The author provides an example of a comp that converts camelCased keywords into a clojure map with a more idiomatic camel-cased approach. Here's the code using comp:
(require '[clojure.string :as str])
(def camel->keyword (comp keyword
str/join
(partial interpose \-)
(partial map str/lower-case)
#(str/split % #"(?<=[a-z])(?=[A-Z])")))
This makes a lot of sense, but I don't really like using partial all over the place to handle a variable number of arguments. Instead, an alternative is provided here:
(defn camel->keyword
[s]
(->> (str/split s #"(?<=[a-z])(?=[A-Z])")
(map str/lower-case)
(interpose \-)
str/join
keyword))
This syntax is much more readable, and mimics the way I would think about solving a problem (front to back, instead of back to front). Extending the comp to complete the aforementioned goal...
(def camel-pairs->map (comp (partial apply hash-map)
(partial map-indexed (fn [i x]
(if (odd? i)
x
(camel->keyword x))))))
What would be the equivalent using ->>? I'm not exactly sure how to thread map-indexed (or any iterative function) using ->>. This is wrong:
(defn camel-pairs->map
[s]
(->> (map-indexed (fn [i x]
(if (odd? i)
x
(camel-keyword x)))
(apply hash-map)))
Three problems: missing a parenthesis, missing the > in the name of camel->keyword, and not "seeding" your ->> macro with the initial expression s.
(defn camel-pairs->map [s]
(->> s
(map-indexed
(fn [i x]
(if (odd? i)
x
(camel->keyword x))))
(apply hash-map)))
Is this really more clear than say?
(defn camel-pairs->map [s]
(into {}
(for [[k v] (partition 2 s)]
[(camel->keyword k) v])))

Idiomatic way of finding functions in a namesspace containing specific metadata?

I'm trying to figure out the best way to troll a namespace for functions that contain a specific bit of metadata. I've come up with a solution, but it feels a little awkward and I'm not at all sure I'm going about it the right way. There's a second component to this as well: I don't just want the names of the functions, I want to find them and then execute them. Here's a snippet of what I'm doing presently:
(defn wrap-routes
[req from-ns]
(let [publics (ns-publics from-ns)
routes (->>
(keys publics)
(map #(meta (% publics)))
(filter #(= (:route-handler %) true))
(map #(:name %)))
resp (first
(->>
(map #((% publics) req) routes)
(filter #(:status %))))]
(or resp not-found)))
As you can see, I'm doing all sorts of gymnastics to see if my metadata is attached to any functions in a given namespace and then am doing extra work after that to get the actual function back. I'm sure there must be a better way. So my question is, how would you do this?
(defn wrap-routes [req from-ns]
(or (first (filter :status
(for [[name f] (ns-publics from-ns)
:when (:route-handler (meta f))]
(f req))))
not-found))
You can do something like this:
(defn wrap-routes
[req from-ns]
(->> (ns-publics from-ns)
(filter #(:route-handler (meta (%1 1))))
(map #((%1 1) req))
(filter #(:status %))
first
(#(or % not-found))))

Let over lambda block-scanner in clojure

I have just start reading Let over lambda and I thought I would try and write a clojure version of the block-scanner in the closures chapter.
I have the following so far:
(defn block-scanner [trigger-string]
(let [curr (ref trigger-string) trig trigger-string]
(fn [data]
(doseq [c data]
(if (not (empty? #curr))
(dosync(ref-set curr
(if (= (first #curr) c)
(rest #curr)
trig)))))
(empty? #curr))))
(def sc (block-scanner "jihad"))
This works I think, but I would like know what I did right and what I could do better.
I would not use ref-set but alter because you don't reset the state to a completely new value, but update it to a new value which is obtained from the old one.
(defn block-scanner
[trigger-string]
(let [curr (ref trigger-string)
trig trigger-string]
(fn [data]
(doseq [c data]
(when (seq #curr)
(dosync
(alter curr
#(if (-> % first (= c))
(rest %)
trig)))))
(empty? #curr))))
Then it is not necessary to use refs since you don't have to coordinate changes. Here an atom is a better fit, since it can be changed without all the STM ceremony.
(defn block-scanner
[trigger-string]
(let [curr (atom trigger-string)
trig trigger-string]
(fn [data]
(doseq [c data]
(when (seq #curr)
(swap! curr
#(if (-> % first (= c))
(rest %)
trig))))
(empty? #curr))))
Next I would get rid of the imperative style.
it does more than it should: it traverses all data - even if we found a match already. We should stop early.
it is not thread-safe, since we access the atom multiple times - it might change in between. So we must touch the atom only once. (Although this is probably not interesting in this case, but it's good to make it a habit.)
it's ugly. We can do all the work functionally and just save the state, when we come to a result.
(defn block-scanner
[trigger-string]
(let [state (atom trigger-string)
advance (fn [trigger d]
(when trigger
(condp = d
(first trigger) (next trigger)
; This is maybe a bug in the book. The book code
; matches "foojihad", but not "jijihad".
(first trigger-string) (next trigger-string)
trigger-string)))
update (fn [trigger data]
(if-let [data (seq data)]
(when-let [trigger (advance trigger (first data))]
(recur trigger (rest data)))
trigger))]
(fn [data]
(nil? (swap! state update data)))))