idiomatic clojure prefix replacement - clojure

I have a map representing information about a subversion commit.
Example contents:
(def commit
{:repository "/var/group1/project1"
:revision-number "1234"
:author "toolkit"
etc..}
I would like to change the repository based on a prefix match, so that:
/var/group1 maps to http://foo/group1
/var/group2 maps to http://bar/group2
I have created 2 functions like:
(defn replace-fn [prefix replacement]
(fn [str]
(if (.startsWith str prefix)
(.replaceFirst str prefix replacement)
str)))
(def replace-group1 (replace-fn "/var/group1" "http://foo/group1"))
(def replace-group2 (replace-fn "/var/group2" "http://bar/group2"))
And now I have to apply them:
(defn fix-repository [{:keys [repository] :as commit}]
(assoc commit :repository
(replace-group1
(replace-group2 repository))))
But this means I have to add an extra wrapper in my fix-repository for each new replacement.
I would like to simply:
Given a commit map
Extract the :repository value
Loop through a list of replacement prefixes
If any prefix matches, replace :repository value with the new string
Otherwise, leave the :repository value alone.
I can't seem to build the right loop, reduce, or other solution to this.

You can use function composition:
(def commit
{:repository "/var/group2/project1"
:revision-number "1234"
:author "toolkit"})
(defn replace-fn [prefix replacement]
(fn [str]
(if (.startsWith str prefix)
(.replaceFirst str prefix replacement)
str)))
(def replacements
(comp (replace-fn "/var/group1" "http://foo/group1")
(replace-fn "/var/group2" "http://foo/group2")))
(defn fix-repository [commit replacements]
(update-in commit [:repository] replacements))
(fix-repository commit replacements)

How about something like this?
(defn replace-any-prefix [replacements-list string]
(or (first
(filter identity
(map (fn [[p r]]
(when (.startsWith string p)
(.replaceFirst string p r)))
replacements-list)))
string)))
(update-in commit
[:repository]
(partial replace-any-prefix
[["/var/group1" "http://foo/group1"]
["/var/group2" "http:/foo/group2"]]))
Documentation for update-in: http://clojuredocs.org/clojure_core/clojure.core/update-in

Related

Improving complex data structure replacement

I'm attempting to modify a specific field in a data structure, described below (a filled example can be found here:
[{:fields "There are a few other fields here"
:incidents [{:fields "There are a few other fields here"
:updates [{:fields "There are a few other fields here"
:content "THIS is the field I want to replace"
:translations [{:based_on "Based on the VALUE of this"
:content "Replace with this value"}]}]}]}]
I already have this implemented it in a number of functions, as below:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(vec (map translate-update updates)))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(vec (map translate-incident incidents)))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(vec (map translate-service services)))
Each array could have any number of entries (though the number is likely less than 10).
The basic premise is to replace the :content in each :update with the relevant :translation based on a provided value.
My Clojure knowledge is limited, so I'm curious if there is a more optimal way to achieve this?
EDIT
Solution so far:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(mapv translate-update updates))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(mapv translate-incident incidents))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(mapv translate-service services))
I would start more or less as you do, bottom-up, by defining some functions that look like they will be useful: how to choose a translation from among a list of translations, and how to apply that choice to an update. But I wouldn't make the functions so tiny as yours: the logic is all spread out into a lot of places, and it's not easy to get an overall idea of what is going on. Here are the two functions I'd start with:
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
...)
Of course you can defn them instead if you'd like, and I suspect many people would, but they're only going to be used in one place, and they're intimately involved with the context in which they're used, so I like a letfn. These two functions are really all the interesting logic; the rest is just some boring tree-traversal code to apply this logic in the right places.
Now to build out the body of the letfn is straightforward, and easy to read if you make your code be the same shape as the data it manipulates. We want to walk through a series of nested lists, updating objects on the way, and so we just write a series of nested for comprehensions, calling update to descend into the right keyspace:
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))
I think using for here is miles better than using map, although of course they are equivalent as always. The important difference is that as you read through the code you see the new context first ("okay, now we're doing something to each user"), and then what is happening inside that context; with map you see them in the other order and it is difficult to keep tack of what is happening where.
Combining these, and putting them into a defn, we get a function that you can call with your example input and which produces your desired output (assuming a suitable definition of get-locale):
(defn translate [users]
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))))
we can try to find some patterns in this task (based on the contents of the snippet from github gist, you've posted):
simply speaking, you need to
1) update every item (A) in vector of data
2) updating every item (B) in vector of A's :incidents
3) updating every item (C) in vector of B's :updates
4) translating C
The translate function could look like this:
(defn translate [{translations :translations :as item} locale]
(assoc item :content
(or (some #(when (= (:locale %) locale) (:content %)) translations)
:no-translation-found)))
it's usage (some fields are omitted for brevity):
user> (translate {:id 1
:content "abc"
:severity "101"
:translations [{:locale "fr_FR"
:content "abc"}
{:locale "ru_RU"
:content "абв"}]}
"ru_RU")
;;=> {:id 1,
;; :content "абв",
;; :severity "101",
;; :translations [{:locale "fr_FR", :content "abc"} {:locale "ru_RU", :content "абв"}]}
then we can see that 1 and 2 are totally similar, so we can generalize that:
(defn update-vec-of-maps [data k f]
(mapv (fn [item] (update item k f)) data))
using it as a building block you can make up the whole data transformation:
(defn transform [data locale]
(update-vec-of-maps
data :incidents
(fn [incidents]
(update-vec-of-maps
incidents :updates
(fn [updates] (mapv #(translate % locale) updates))))))
(transform data "it_IT")
returns what you need.
then you can generalize it further, making the utility function for arbitrary depth transformations:
(defn deep-update-vec-of-maps [data ks terminal-fn]
(if (seq ks)
((reduce (fn [f k] #(update-vec-of-maps % k f))
terminal-fn (reverse ks))
data)
data))
and use it like this:
(deep-update-vec-of-maps data [:incidents :updates]
(fn [updates]
(mapv #(translate % "it_IT") updates)))
I recommend you look at https://github.com/nathanmarz/specter
It makes it really easy to read and update clojure data structures. Same performance as hand-written code, but much shorter.

map-indexed alternative for reducers

Is there a map-indexed alternative for clojure.core.reducers? I would like something that would work lazily like r/map (without constructing new sequence).
I suspect that what you really want to use is a transducer, since map-indexed has a 1-arity version (as does map, filter, and many other core functions) that returns a transducer. Transducers are composable, and do not create an intermediate sequence. Here is a short example:
(def xf (comp
(map-indexed (fn [i value] [i value]))
(filter (fn [[i value]] (odd? i)))
(map second)))
This says: generate an indexed vector using map-indexed, filter out only the vectors whose index is odd, and get the second element. It's a long-winded way of saying (filter odd? collection) but it's only for example purposes.
You can use this with into:
(into [] xf "ThisIsATest")
=> [\h \s \s \T \s]
or you can use the transduce function and apply str to the result:
(transduce xf str "ThisIsATest")
=> "hssTs"

clojure: filtering a vector of maps by keys existence and values

I have a vector of maps like this one
(def map1
[{:name "name1"
:field "xxx"}
{:name "name2"
:requires {"element1" 1}}
{:name "name3"
:consumes {"element2" 1 "element3" 4}}])
I'm trying to define a functions that takes in a map like {"element1" 1 "element3" 6} (ie: with n fields, or {}) and fiters the maps in map1, returning only the ones that either have no requires and consumes, or have a lower number associated to them than the one associated with that key in the provided map (if the provided map doesn't have any key like that, it's not returned)
but I'm failing to grasp how to approach the maps recursive loop and filtering
(defn getV [node nodes]
(defn filterType [type nodes]
(filter (fn [x] (if (contains? x type)
false ; filter for key values here
true)) nodes))
(filterType :requires (filterType :consumes nodes)))
There's two ways to look at problems like this: from the outside in or from the inside out. Naming things carefully can really help when working with nested structures. For example, calling a vector of maps map1 may be adding to the confusion.
Starting from the outside, you need a predicate function for filtering the list. This function will take a map as a parameter and will be used by a filter function.
(defn comparisons [m]
...)
(filter comparisons map1)
I'm not sure I understand the comparisons precisely, but there seems to be at least two flavors. The first is looking for maps that do not have :requires or :consumes keys.
(defn no-requires-or-consumes [m]
...)
(defn all-keys-higher-than-values [m]
...)
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
Then it's a matter of defining the individual comparison functions
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
The second is more complicated. It operates on one or two inner maps but the behaviour is the same in both cases so the real implementation can be pushed down another level.
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
The crux of the comparison is looking at the number in the key part of the map vs the value. Pushing the details down a level gives:
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
Note: I chose >= here so that the second entry in the sample data will pass.
That leaves only pulling the number of of key string. how to do that can be found at In Clojure how can I convert a String to a number?
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
Stringing all these together and running against the example data returns the first and second entries.
Putting everything together:
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
(filter comparisons map1)

Read each entry lazily from a zip file

I want to read file entries in a zip file into a sequence of strings if possible. Currently I'm doing something like this to print out directory names for example:
(defn entries [zipfile]
(lazy-seq
(if-let [entry (.getNextEntry zipfile)]
(cons entry (entries zipfile)))))
(defn with-each-entry [fileName f]
(with-open [z (ZipInputStream. (FileInputStream. fileName))]
(doseq [e (entries z)]
; (println (.getName e))
(f e)
(.closeEntry z))))
(with-each-entry "tmp/my.zip"
(fn [e] (if (.isDirectory e)
(println (.getName e)))))
However this will iterate through the entire zip file. How could I change this so I could take the first few entries say something like:
(take 10 (zip-entries "tmp/my.zip"
(fn [e] (if (.isDirectory e)
(println (.getName e)))))
This seems like a pretty natural fit for the new transducers in CLJ 1.7.
You just build up the transformations you want as a transducer using comp and the usual seq-transforming fns with no seq/collection argument. In your example cases,
(comp (map #(.getName %)) (take 10)) and
(comp (filter #(.isDirectory %)) (map #(-> % .getName println))).
This returns a function of multiple arities which you can use in a lot of ways. In this case you want to eagerly reduce it over the entries sequence (to ensure realization of the entries happens inside with-open), so you use transduce (example zip data made by zipping one of my clojure project folders):
(with-open [z (-> "training-day.zip" FileInputStream. ZipInputStream.)]
(let[transform (comp (map #(.getName %)) (take 10))]
(transduce transform conj (entries z))))
;;return value: [".gitignore" ".lein-failures" ".midje-grading-config.clj" ".nrepl-port" ".travis.yml" "project.clj" "README.md" "target/" "target/classes/" "target/repl-port"]
Here I'm transducing with base function conj which makes a vector of the names. If you instead want your transducer to perform side-effects and not return a value, you can do that with a base function like (constantly nil):
(with-open [z (-> "training-day.zip" FileInputStream. ZipInputStream.)]
(let[transform (comp (filter #(.isDirectory %)) (map #(-> % .getName println)))]
(transduce transform (constantly nil) (entries z))))
which gives output:
target/
target/classes/
target/stale/
test/
A potential downside with this is that you'll probably have to manually incorporate .closeEntry calls into each transducer you use here to prevent holding those resources, because you can't in the general case know when each transducer is done reading the entry.

clojure.core/map isn't working

I am trying to figure out why one of my map calls isn't working. I am building a crawler with the purpose of learning Clojure.
(use '[clojure.java.io])
(defn md5
"Generate a md5 checksum for the given string"
[token]
(let [hash-bytes
(doto (java.security.MessageDigest/getInstance "MD5")
(.reset)
(.update (.getBytes token)))]
(.toString
(new java.math.BigInteger 1 (.digest hash-bytes)) ; Positive and the size of the number
16)))
(defn full-url [url base]
(if (re-find #"^http[s]{0,1}://" url)
url
(apply str "http://" base (if (= \/ (first url))
url
(apply str "/" url)))))
(defn get-domain-from-url [url]
(let [matcher (re-matcher #"http[s]{0,1}://([^/]*)/{0,1}" url)
domain-match (re-find matcher)]
(nth domain-match 1)))
(defn crawl [url]
(do
(println "-----------------------------------\n")
(if (.exists (clojure.java.io/as-file (apply str "theinternet/page" (md5 url))))
(println (apply str url " already crawled ... skiping \n"))
(let [domain (get-domain-from-url url)
text (slurp url)
matcher (re-matcher #"<a[^>]*href\s*=\s*[\"\']([^\"\']*)[\"\'][^>]*>(.*)</a\s*>" text)]
(do
(spit (apply str "theinternet/page" (md5 url)) text)
(loop [urls []
a-tag (re-find matcher)]
(if a-tag
(let [u (nth a-tag 1)]
(recur (conj urls (full-url u domain)) (re-find matcher)))
(do
(println (apply str "parsed: " url))
(println (apply str (map (fn [u]
(apply str "-----> " u "\n")) urls)))
(map crawl urls)))))))))
(defn -main
"I don't do a whole lot ... yet."
[& args]
(crawl "http://www.example.com/"))
First call to map works:
(println (apply str (map (fn [u]
(apply str "-----> " u "\n")) urls)))
But the second call (map crawl urls) seems to be ignored.
The crawl function is working as intended, slurping the url, parsing with the regex for a tags for fetching the href and the accumulation in the loop works as intended, but when i call map with crawl and the urls that have been found on the page, the call to map is ignored.
Also if I try to call (map crawl ["http://www.example.com"]) this call is, again, ignored.
I have started my Clojure adventure a couple of weeks ago so any suggestions/criticisms are most welcomed.
Thank you
In Clojure, map is lazy. From the docs, map:
Returns a lazy sequence consisting of the result of applying f to the
set of first items of each coll, followed by applying f to the set
of second items in each coll, until any one of the colls is
exhausted.
Your crawl function is a function with side effects - you're spit-ing some results to a file, and println-ing to report on progress. But, because map returns a lazy sequence, none of these things will happen - the result sequence is never explicitly realized so it can stay lazy.
There are a number of ways of realizing a lazy sequence (that has been created e.g. using map), but in this case, as you want to iterate over a sequence using a function that has side-effects, it's probably best to use doseq:
Repeatedly executes body (presumably for side-effects) with
bindings and filtering as provided by "for". Does not retain
the head of the sequence. Returns nil.
If you replace the call to (map crawl urls) with (doseq [u urls] (crawl u)), you should get the desired result.
Note: your first call to map works as expected because you are realizing the results using (apply str). There is no way to (apply str) without evaluating the sequence.