Combing multiple functions into a single function - regex

I have written several functions that input strings and use varying regular expressions to search for patterns within the strings. All of the functions work on the same input [string]. What is the optimal way to combine all such functions into a single function?
I had tried combining the all of the regular expressions into a single regex, but ran into issues of degeneracy. Whereby the pattern fit multiple regular expressions and was outputting incorrect results. Next, I tried using the threading arrows -> and ->> but was unable to get those to work. I believe this might be the right option to use, but could not get the functions to run properly. So I am unable to test my hypothesis.
As an example of two functions to combine consider the following:
(defn fooip [string]
(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" string))
and
(defn foophone [string]
(re-seq #"[0-9]{3}-?[0-9]{3}-?[0-9]{4}" s))

If you have multiple functions that you want to combine into a function that will return the result of applying each function to the same input, that is exactly the purpose of juxt.
(def foo (juxt foophone fooip))
(foo "555-555-5555 222.222.222.222 888-888-8888")
;=> [("555-555-5555" "888-888-8888") ("222.222.222.222")]

Your question is a little vague, but the threading arrows' purpose is to apply multiple functions sequentially to the output of each other: (-> 1 inc inc inc), for example, is equivalent to (inc (inc (inc 1))).
From your code samples, it looks like you have multiple regexes you want to match against a single input string. The simple way to do that is to use for:
(for [r [#"foo" #"bar" #"baz"]] (re-seq r s))

To check for both patterns you can use or:
(defn phone-or-ip [s]
(or (matchphone s) (matchip s)))
There isn't one proper way to combine functions. It depends what you want to do.
P.S. There are ways to combine the regexps themselves. The naïve way is to just use | and parentheses to combine the two. I think there are optimizers, which can improve such patterns.

Related

In Clojure, why do you have to use parenthesis when def'ing a function and use cases of let

I'm starting to learn clojure and I've stumbled upon the following, when I found myself declaring a "sum" function (for learning purposes) I wrote the following code
(def sum (fn [& args] (apply + args)))
I have understood that I defined the symbol sum as containing that fn, but why do I have to enclose the Fn in parenthesis, isn't the compiler calling that function upon definition instead of when someone is actually invoking it? Maybe it's just my imperative brain talking.
Also, what are the use cases of let? Sometimes I stumble on code that use it and other code that don't, for example on the Clojure site there's an exercise to use the OpenStream function from the Java Interop, I wrote the following code:
(defn http-get
[url]
(let [url-obj (java.net.URL. url)]
(slurp (.openStream url-obj))))
(http-get "https://www.google.com")
whilst they wrote the following on the clojure site as an answer
(defn http-get [url]
(slurp
(.openStream
(java.net.URL. url))))
Again maybe it's just my imperative brain talking, the need of having a "variable" or an "object" to store something before using it, but I quite don't understand when I should use let or when I shouldn't.
To answer both of your questions:
1.
(def sum (fn [& args] (apply + args)))
Using def here is very unorthodox. When you define a function you usually want to use defn. But since you used def you should know that def binds a name to a value. fn's return value is a function. Effectively you bound the name sum to the function returned by applying (using parenthesis which are used for application) fn.
You could have used the more traditional (defn sum [& args] (apply + args))
2.
While using let sometimes makes sense for readability (separating steps outside their nested use) it is sometimes required when you want to do something once and use it multiple times. It binds the result to a name within a specified context.
We can look at the following example and see that without let it becomes harder to write (function is for demonstration purposes):
(let [db-results (query "select * from table")] ;; note: query is not a pure function
;; do stuff with db-results
(f db-results)
;; return db-results
db-results)))
This simply re-uses a return value (db-results) from a function that you usually only want to run once - in multiple locations. So let can be used for style like the example you've given, but its also very useful for value reuse within some context.
Both def and defn define a global symbol, sort of like a global variable in Java, etc. Also, (defn xxx ...) is a (very common) shortcut for (def xxx (fn ...)). So, both versions will work exactly the same way when you run the program. Since the defn version is shorter and more explicit, that is what you will do 99% of the time.
Typing (let [xxx ...] ...) defines a local symbol, which cannot be seen by code outside of the let form, just like a local variable (block-scope) in Java, etc.
Just like Java, it is optional when to have a local variable like url-obj. It will make no difference to the running program. You must answer the question, "Which version makes my code easier to read and understand?" This part is no different than Java.

Clojure pipe collection one by one

How in Clojure process collections like in Java streams - one by one thru all the functions instead of evaluating all the elements in all the stack frame. Also I would describe it as Unix pipes (next program pulls chunk by chunk from previous one).
As far as I understand your question, you may want to look into two things.
First, understand the sequence abstraction. This is a way of looking at collections which consumes them one by one and lazily. It is an important Clojure idiom and you'll meet well known functions like map, filter, reduce, and many more. Also the macro ->>, which was already mentioned in a comment, will be important.
After that, when you want to dig deeper, you probably want to look into transducers and reducers. In a grossly oversimplifying summary, they allow you combine several lazy functions into one function and then process a collection with less laziness, less memory consumption, more performance, and possibly on several threads. I consider these to be advanced topics, though. Maybe the sequences are already what you were looking for.
Here is a simple example from ClojureDocs.org
;; Use of `->` (the "thread-first" macro) can help make code
;; more readable by removing nesting. It can be especially
;; useful when using host methods:
;; Arguably a bit cumbersome to read:
user=> (first (.split (.replace (.toUpperCase "a b c d") "A" "X") " "))
"X"
;; Perhaps easier to read:
user=> (-> "a b c d"
.toUpperCase
(.replace "A" "X")
(.split " ")
first)
"X"
As always, don't forget the Clojure CheatSheet or Clojure for the Brave and True.

Eval with local bindings function

I'm trying to write a function which takes a sequence of bindings and an expression and returns the result.
The sequence of bindings are formatted thus: ([:bind-type [bind-vec] ... ) where bind-type is either let or letfn. For example:
([:let [a 10 b 20]] [:letfn [(foo [x] (inc x))]] ... )
And the expression just a regular Clojure expression e.g. (foo (+ a b)) so together this example pair of inputs would yeild 31.
Currently I have this:
(defn wrap-bindings
[[[bind-type bind-vec :as binding] & rest] expr]
(if binding
(let [bind-op (case bind-type :let 'let* :letfn 'letfn*)]
`(~bind-op ~bind-vec ~(wrap-bindings rest expr)))
expr))
(defn eval-with-bindings
([bindings expr]
(eval (wrap-bindings bindings expr))))
I am not very experienced with Clojure and have been told that use of eval is generally bad practice. I do not believe that I can write this as a macro since the bindings and expression may only be given at run-time, so what I am asking is: is there a more idiomatic way of doing this?
eval is almost always not the answer though sometimes rare things happen. In this case you meet the criteria because:
since the bindings and expression may only be given at run-time
You desire arbitrary code to be input and run while the program is going
The binding forms to be used can take any data as it's input, even data from elsewhere in the program
So your existing example using eval is appropriate given the contraints of the question at least as I'm understanding it. Perhaps there is room to change the requirements to allow the expressions to be defined in advance and remove the need for eval, though if not then i'd suggest using what you have.

calling a library function of arbitrary name

say there's a library l, which has two functions (a and b).
Calling both functions and merging the results into a vector could be done like this:
(concat (l/a) (l/b))
Is there a way to make this more generic? I tried something like this, but it threw an exception:
(apply concat (map #(l/%) ['a 'b]))
of course, this would work:
(apply concat [l/a l/b])
Calling both functions and merging the results into a vector could be done like this:
(concat (l/a) (l/b))
No, you will not get a vector. And you will only get a sequence if those functions return sequences. Otherwise, definitely not, you will get a runtime exception with this code and your assumption.
It sounds like you have a bunch of functions and you want to concatenate the results of them all together? There is no need to quote them, just make a sequence of the functions:
[l/a l/b l/c ...]
And use apply with concat as you already are, or use reduce to accumulate values.
Call vec on the result if you need it to be a vector rather than a sequence.
Your other solutions are definitely making your code much much more complex, unnecessary, and difficult to read. (also, you almost never need to quote vars as you are doing)
It looks like you want a general way of invoking a function inside a namespace. You can construct a symbol and dereference it to find the functions, then combine the results using mapcat e.g.
(mapcat #((find-var (symbol "l" %))) ["a" "b"])
alternatively you could first find the namespace and use ns-resolve to find the vars e.g.
(let [ns (find-ns 'l)]
(mapcat #((ns-resolve ns %)) ['a 'b]))

Pattern matching functions in Clojure?

I have used erlang in the past and it has some really useful things like pattern matching functions or "function guards". Example from erlang docs is:
fact(N) when N>0 ->
N * fact(N-1);
fact(0) ->
1.
But this could be expanded to a much more complex example where the form of parameter and values inside it are matched.
Is there anything similar in clojure?
There is ongoing work towards doing this with unification in the core.match ( https://github.com/clojure/core.match ) library.
Depending on exactly what you want to do, another common way is to use defmulti/defmethod to dispatch on arbitrary functions. See http://clojuredocs.org/clojure_core/clojure.core/defmulti (at the bottom of that page is the factorial example)
I want to introduce defun, it's a macro to define functions with pattern matching just like erlang,it's based on core.match. The above fact function can be wrote into:
(use 'defun)
(defun fact
([0] 1)
([(n :guard #(> % 0))]
(* n (fact (dec n)))))
Another example, an accumulator from zero to positive number n:
(defun accum
([0 ret] ret)
([n ret] (recur (dec n) (+ n ret)))
([n] (recur n 0)))
More information please see https://github.com/killme2008/defun
core.match is a full-featured and extensible pattern matching library for Clojure. With a little macro magic and you can probably get a pretty close approximation to what you're looking for.
Also, if you want to take apart only simple structures like vectors and maps (any thing that is sequence or map, e.g. record, in fact), you could also use destructuring bind. This is the weaker form of pattern matching, but still is very useful. Despite it is described in let section there, it can be used in many contexts, including function definitions.