Clojure: # (hash) vs re-pattern - clojure

I'm learning Clojure, and I discovered that there are two way to create a regular expression:
(re-pattern "12(ab)*34")
#"12(ab)*34"
I did a quick benchmark, and it seems that (1) is considerably faster than (2) when using boot repl (took ~7 seconds instead of ~10 seconds when in a loop).
May I ask as to why that may be? I was expecting the #"" syntax to do some clever compile-time optimisation, but instead it's slower than watching paint dry.

I'm not sure what you're measuring - you mentioned just construction of regex objects which is probably not what you are interested in.
However, measuring re-find to match string with regex yields quite similar results in both cases
(let [re #"12(ab)*34"
s "aanbciasdfsidufuo12ab34xcnm,xcvnm,xncv,m"]
(c/quick-bench
(re-find re s)))
Evaluation count : 907494 in 6 samples of 151249 calls.
Execution time mean : 659.120946 ns
...
(let [re (re-pattern "12(ab)*34")
s "aanbciasdfsidufuo12ab34xcnm,xcvnm,xncv,m"]
(c/quick-bench
(re-find re s)))
Evaluation count : 1018872 in 6 samples of 169812 calls.
Execution time mean : 588.138157 ns
...
There's nothing fundamentally different between literal regex syntax and re-pattern. Both end up using java.util.Pattern.compile method to compile final regex.
re-pattern source code:
...
[s] (if (instance? java.util.regex.Pattern s)
s
(. java.util.regex.Pattern (compile s))))
literal syntax is handled by LispReader$RegexReader nested class:
public static class RegexReader extends AFn{
static StringReader stringrdr = new StringReader();
public Object invoke(Object reader, Object doublequote, Object opts, Object pendingForms) {
StringBuilder sb = new StringBuilder();
...
return Pattern.compile(sb.toString());
}
}

Related

Extend Clojure Regular Expressions with IFn to support map

I want to be able to call map on regular expressions, like so:
(map #"ab+c*" ["abbb" "ac" "abbcc"])
=> ("abbb" "abbcc")
How do I extend regular expressions to support the IFn interface? Or is there a different way to do it?
ClojureScript:
(extend-type js/RegExp
IFn
(-invoke
([match s] (re-find match s))
([match replacement s]
(clojure.string/replace s match replacement))))
Now you can call regular expressions as functions and even pass them to map:
(#"abc+" "abcccc")
=> "abcccc"
(map #"abc+" ["abcccc" "abcccccccc"])
=> ("abcccc" "abcccccccc")
Unfortunately, IFn is not a protocol in Clojure, so you cannot extend it. That's unfortunate.
Since IFn isn't a protocol in core Clojure, I don't believe that this is possible.
The closest I could get is creating a wrapper type that implements IFn:
(defrecord R [^java.util.regex.Pattern regex]
clojure.lang.IFn
(invoke [this s]
(re-find regex s))
(invoke [this replacement s]
(clojure.string/replace s regex replacement)))
(map (->R #"abc+") ["abcccc" "abcccccccc"])
=> ("abcccc" "abcccccccc")
The trouble with trying to do this is that it's not directly obvious what you're trying to do with the regular expression - Particularly when most of your production code will look like (map #"ab+" entries)
Regular expressions are about a pattern matching only, they don't directly imply what transformation you want from them, so you really should steer clear of trying to shoehorn that into it.
If it's a once-off, just use
(map #(clojure.string/replace % #"ab+c*" "ab") ["ab" "ac" "abbcc"])
=> ("ab" "ac" "ab")
(It's not immediately obvious how your example is supposed to work? You have less elements in your result - are you filtering and transforming? How are you getting to the "abbb" element?)
If you're using this a lot, I would recommend simply creating a helper function in a common namespace that you can use with map instead of trying to extend the IFn interface.. Since creating a function is, in effect, a direct way to extend from IFn, but it's a named function with very specific semantics that you can customize precisely.
As CmdrDats says, using re-find in an anonymous function is definitely the way to go:
(filter #(re-find #"ab+c*" %) ["abbb" "ac" "abbcc"])
=> ("abbb" "abbcc")
I sometimes use a helper function to emphasize that I want just true/false output (not the match nor a sequence of matches), and since I'm always forgetting the differences between the re-xxx functions:
(ns demo.core
(:require [schema.core :as s]))
(s/defn contains-match? :- s/Bool
"Returns true if the regex matches any portion of the intput string."
[search-str :- s/Str
re :- s/Any]
#?(:clj (assert (instance? java.util.regex.Pattern re)))
(boolean (re-find re search-str)))

Creating a Clojure macro that uses a string to call a java function

So I'm trying to make a Clojure macro that makes it easy to interop with Java classes utilizing the Builder pattern.
Here's what I've tried so far.
(defmacro test-macro
[]
(list
(symbol ".queryParam")
(-> (ClientBuilder/newClient)
(.target "https://www.test.com"))
"key1"
(object-array ["val1"])))
Which expands to the below
(.
#object[org.glassfish.jersey.client.JerseyWebTarget 0x107a5073 "org.glassfish.jersey.client.JerseyWebTarget#107a5073"]
queryParam
"key1"
#object["[Ljava.lang.Object;" 0x16751ba2 "[Ljava.lang.Object;#16751ba2"])
The desired result is:
(.queryParam
#object[org.glassfish.jersey.client.JerseyWebTarget 0x107a5073 "org.glassfish.jersey.client.JerseyWebTarget#107a5073"]
"key1"
#object["[Ljava.lang.Object;" 0x16751ba2 "[Ljava.lang.Object;#16751ba2"])
I guess the . is causing something to get evaluated and moved around? In which case the solution would to be to quote it. But how can I quote the results of an evaluated expression?
My goal is to convert maps into code that build the object by have the maps keys be the functions to be called and the values be the arguments passed into the Java functions.
I understand how to use the threading and do-to macros but am trying to make request building function data driven. I want to be able take in a map with the key as "queryParam" and the values as the arguments. By having this I can leverage the entirety on the java classes functions only having to write one function myself and there is enough of a 1 to 1 mapping I don't believe others will find it magical.
(def test-map {"target" ["https://www.test.com"]
"path" ["qa" "rest/service"]
"queryParam" [["key1" (object-array ["val1"])]
["key2" (object-array ["val21" "val22" "val23"])]] })
(-> (ClientBuilder/newClient)
(.target "https://www.test.com")
(.path "qa")
(.path "rest/service")
(.queryParam "key1" (object-array ["val1"]))
(.queryParam "key2" (object-array ["val21" "val22" "val23"])))
From your question it's not clear if you have to use map as your builder data structure. I would recommend using the threading macro for working directly with Java classes implementing the builder pattern:
(-> (ClientBuilder.)
(.forEndpoint "http://example.com")
(.withQueryParam "key1" "value1")
(.build))
For classes that don't implement builder pattern and their methods return void (e.g. setter methods) you can use doto macro:
(doto (Client.)
(.setEndpoint "http://example.com")
(.setQueryParam "key1" "value1"))
Implementing a macro using a map for encoding Java method calls is possible but awkward. You would have to keep each method arguments inside a sequence (in map values) to be a able to call methods with multiple parameters or have some convention for storing arguments for single parameter methods, handling varargs, using map to specify method calls doesn't guarantee the order they will be invoked etc. It will add much complexity and magic to your code.
This is how you could implement it:
(defmacro builder [b m]
(let [method-calls
(map (fn [[k v]] `(. (~(symbol k) ~#v))) m)]
`(-> ~b
~#method-calls)))
(macroexpand-1
'(builder (StringBuilder.) {"append" ["a"]}))
;; => (clojure.core/-> (StringBuilder.) (. (append "a")))
(str
(builder (StringBuilder.) {"append" ["a"] }))
;; => "a"

Is there a complete list of lazy functions of Clojure's core module?

After a while of working with Clojure, I have accumulated some knowledge on its laziness. I know whether a frequently-used API such as map is lazy. However, I still feel dubious when I start using an unfamiliar API such as with-open.
Is there any document that shows a complete list of lazy APIs of Clojure's core module?
You can find functions that return lazy sequences by opening up the Clojure code https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj
and searching for "Returns a lazy"
I am not aware of any curated lists of them.
The rule of thumb is: if it returns a sequence, it will be a lazy sequence, if it returns a value, it will force evaluation.
When using a new function, macro or special form, read the docstring. Most development environments have a key to show the docstring, or at least navigate to the source (where you can see the docstring), and there is always http://clojure.org/api/api.
In the case of with-open:
with-open
macro
Usage: (with-open bindings & body)
bindings => [name init ...]
Evaluates body in a try expression with names bound to the values
of the inits, and a finally clause that calls (.close name) on each
name in reverse order.
We can see that the result of calling with-open is evaluation of the expression with a final close. So we know that there is nothing lazy about it. However that doesn't mean you don't need to think about laziness inside with-open, quite the opposite!
(with-open [r (io/reader "myfile")]
(line-seq r))
This is a common trap. line-seq returns a lazy sequence! The problem here is that the lazy sequence will be realized after the file is closed, because the file is closed when exiting the scope of with-open. So you need to fully process the lazy sequence before exiting the with-open scope.
My advice is to avoid trying to think about your program as having 'lazy bits' and 'immediate bits', but instead just be mindful that when io or side-effects are involved you need to take care of when things happen as well as what should happen.
digging on a Timothy Pratley's proposal to search in doc:
let's make it fun!
your repl has everything that you need to find out a list of lazy functions.
first of all, there is a clojure.repl/doc macro, which prints documentation to out in repl
user> (doc +)
-------------------------
clojure.core/+
([] [x] [x y] [x y & more])
Returns the sum of nums. (+) returns 0. Does not auto-promote
longs, will throw on overflow. See also: +'
nil
unfortunately we can't get a string of it simply, but we can always rebind the *out* to be a StringWriter, and then get its string value.
so, whan we want to take all the symbols from clojure.core namespace, get their docs, write them all to string, and find every one that contains "returns a lazy". Here comes the help: clojure.core/ns-publics, returning a map of public names to their vars:
user> (take 10 (ns-publics 'clojure.core))
([primitives-classnames #'clojure.core/primitives-classnames]
[+' #'clojure.core/+']
[decimal? #'clojure.core/decimal?]
[restart-agent #'clojure.core/restart-agent]
[sort-by #'clojure.core/sort-by]
[macroexpand #'clojure.core/macroexpand]
[ensure #'clojure.core/ensure]
[chunk-first #'clojure.core/chunk-first]
[eduction #'clojure.core/eduction]
[tree-seq #'clojure.core/tree-seq])
so we just need to get all the keys from there and lookup for their docs.
Let's make a macro for that:
user> (defmacro all-docs []
(let [names (keys (ns-publics 'clojure.core))]
`(binding [*out* (java.io.StringWriter.)]
(do ~#(map #(list `doc %) names))
(str *out*))))
#'user/all-docs
it does just what i've said, gets all publics' docs to string.
now we simply process it:
user> (def all-doc-items (clojure.string/split
(all-docs)
#"-------------------------"))
#'user/all-doc-items
user> (nth all-doc-items 10)
"\nclojure.core/tree-seq\n([branch? children root])\n Returns a lazy sequence of the nodes in a tree, via a depth-first walk.\n branch? must be a fn of one arg that returns true if passed a node\n that can have children (but may not). children must be a fn of one\n arg that returns a sequence of the children. Will only be called on\n nodes for which branch? returns true. Root is the root node of the\n tree.\n"
and now just filter them:
user> (def all-lazy-fns (filter #(re-find #"(?i)returns a lazy" %) all-doc-items))
#'user/all-lazy-fns
user> (count all-lazy-fns)
30
user> (println (take 3 all-lazy-fns))
(
clojure.core/tree-seq
([branch? children root])
Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the tree.
clojure.core/keep-indexed
([f] [f coll])
Returns a lazy sequence of the non-nil results of (f index item). Note,
this means false return values will be included. f must be free of
side-effects. Returns a stateful transducer when no collection is
provided.
clojure.core/take-nth
([n] [n coll])
Returns a lazy seq of every nth item in coll. Returns a stateful
transducer when no collection is provided.
)
nil
And now use these all-lazy-fns however you want.

One argument, many functions

I have an incoming lazy stream lines from a file I'm reading with tail-seq (to contrib - now!) and I want to process those lines one after one with several "listener-functions" that takes action depending on re-seq-hits (or other things) in the lines.
I tried the following:
(defn info-listener [logstr]
(if (re-seq #"INFO" logstr) (println "Got an INFO-statement")))
(defn debug-listener [logstr]
(if (re-seq #"DEBUG" logstr) (println "Got a DEBUG-statement")))
(doseq [line (tail-seq "/var/log/any/java.log")]
(do (info-listener logstr)
(debug-listener logstr)))
and it works as expected. However, there is a LOT of code-duplication and other sins in the code, and it's boring to update the code.
One important step seems to be to apply many functions to one argument, ie
(listen-line line '(info-listener debug-listener))
and use that instead of the boring and error prone do-statement.
I've tried the following seemingly clever approach:
(defn listen-line [logstr listener-collection]
(map #(% logstr) listener-collection))
but this only renders
(nil) (nil)
there is lazyiness or first class functions biting me for sure, but where do I put the apply?
I'm also open to a radically different approach to the problem, but this seems to be a quite sane way to start with. Macros/multi methods seems to be overkill/wrong for now.
Making a single function out of a group of functions to be called with the same argument can be done with the core function juxt:
=>(def juxted-fn (juxt identity str (partial / 100)))
=>(juxted-fn 50)
[50 "50" 2]
Combining juxt with partial can be very useful:
(defn listener [re message logstr]
(if (re-seq re logstr) (println message)))
(def juxted-listener
(apply juxt (map (fn [[re message]] (partial listner re message))
[[#"INFO","Got INFO"],
[#"DEBUG", "Got DEBUG"]]))
(doseq [logstr ["INFO statement", "OTHER statement", "DEBUG statement"]]
(juxted-listener logstr))
You need to change
(listen-line line '(info-listener debug-listener))
to
(listen-line line [info-listener debug-listener])
In the first version, listen-line ends up using the symbols info-listener and debug-listener themselves as functions because of the quoting. Symbols implement clojure.lang.IFn (the interface behind Clojure function invocation) like keywords do, i.e. they look themselves up in a map-like argument (actually a clojure.lang.ILookup) and return nil if applied to something which is not a map.
Also note that you need to wrap the body of listen-line in dorun to ensure it actually gets executed (as map returns a lazy sequence). Better yet, switch to doseq:
(defn listen-line [logstr listener-collection]
(doseq [listener listener-collection]
(listener logstr)))

Trouble with building up a string in Clojure

[this may seem like my problem is with Compojure, but it isn't - it's with Clojure]
I've been pulling my hair out on this seemingly simple issue - but am getting nowhere.
I am playing with Compojure (a light web framework for Clojure) and I would just like to generate a web page showing showing my list of todos that are in a PostgreSQL database.
The code snippets are below (left out the database connection, query, etc - but that part isn't needed because specific issue is that the resulting HTML shows nothing between the <body> and </body> tags).
As a test, I tried hard-coding the string in the call to main-layout, like this:
(html (main-layout "Aki's Todos" "Haircut<br>Study Clojure<br>Answer a question on Stackoverfolw")) - and it works fine.
So the real issue is that I do not believe I know how to build up a string in Clojure. Not the idiomatic way, and not by calling out to Java's StringBuilder either - as I have attempted to do in the code below.
A virtual beer, and a big upvote to whoever can solve it! Many thanks!
=============================================================
;The master template (a very simple POC for now, but can expand on it later)
(defn main-layout
"This is one of the html layouts for the pages assets - just like a master page"
[title body]
(html
[:html
[:head
[:title title]
(include-js "todos.js")
(include-css "todos.css")]
[:body body]]))
(defn show-all-todos
"This function will generate the todos HTML table and call the layout function"
[]
(let [rs (select-all-todos)
sbHTML (new StringBuilder)]
(for [rec rs]
(.append sbHTML (str rec "<br><br>")))
(html (main-layout "Aki's Todos" (.toString sbHTML)))))
=============================================================
Again, the result is a web page but with nothing between the body tags. If I replace the code in the for loop with println statements, and direct the code to the repl - forgetting about the web page stuff (ie. the call to main-layout), the resultset gets printed - BUT - the issue is with building up the string.
Thanks again.
~Aki
for is lazy, and in your function it's never being evaluated. Change for to doseq.
user> (let [rs ["foo" "bar"]
sbHTML (new StringBuilder)]
(for [rec rs]
(.append sbHTML (str rec "<br><br>")))
(.toString sbHTML))
""
user> (let [rs ["foo" "bar"]
sbHTML (new StringBuilder)]
(doseq [rec rs]
(.append sbHTML (str rec "<br><br>")))
(.toString sbHTML))
"foo<br><br>bar<br><br>"
You could also use reduce and interpose, or clojure.string/join from clojure.string, or probably some other options.
user> (let [rs ["foo" "bar"]]
(reduce str (interpose "<br><br>" rs)))
"foo<br><br>bar"
user> (require 'clojure.string)
nil
user> (let [rs ["foo" "bar"]]
(clojure.string/join "<br><br>" rs))
"foo<br><br>bar"
You would like to use the re-gsub like this:
(require 'clojure.contrib.str-utils) ;;put in head for enabling us to use re-gsub later on
(clojure.contrib.str-utils/re-gsub #"\newline" "<br><br>" your-string-with-todos-separated-with-newlines)
This last line will result in the string you like. The require-part is, as you maybe already know, there to enable the compiler to reach the powerful clojure.contrib.str-utils library without importing it to your current namespace (which could potentially lead to unnescessary collisions when the program grows).
re- is for reg-exp, and lets you define a reg-exp of the form #"regexp", which to replace all instances that is hit by the regexp with the argument afterwards, applied to the third argument. The \newline is in this case clojures way of expressing newlines in regexps as well as strings and the character we are looking for.
What I think you really wanted to do is to make a nifty ordered or unordered list in html-format. These can be done with [hiccup-page-helpers][2] (if you don't have them you probably have a compojure from the time before it got splited up in compojure, hiccup and more, since you use the html-function).
If you want to use hiccup-page-helpers, use the command re-split from the clojure.contrib.str-utils mentioned above in this fashion:
(use 'hiccup.page-helpers) ;;watch out for namespace collisions, since all the functions in hiccup.page-helpers got into your current namespace.
(unordered-list (clojure.contrib.str-utils/re-split #"\newline" your-string-with-todos-separated-with-newlines))
which should render a neat
<ul>
<li>todo-item1</li>
<li>todo-item2</li>
</ul>
(and yes, there is an ordered-list command that works the same way!)
In the last line of clojure code above, all you todos gets into a (list "todo1" "todo2") which is immediately consumed by hiccup.page-helpers unordered-list function and is there converted to an html-ized list.
Good luck with compojure and friends!