I'm trying to match following sequences using Prismatic/Schema:
[{:n "some text"}] ; => valid
and
[{:k "some text"} {:n "some text"}] ; => valid
What I have tried:
(s/def Elem3
{:k s/Str})
(s/def Elem2
{:n s/Str})
(s/def Elem
[(s/optional Elem2 "elem2") Elem3])
(s/validate Elem [{:k "huji"}])
;; =>
;; Value does not match schema: [(named {:n missing-required-key, :k
;; disallowed-key} "elem2")]
(s/def Elem
[(s/maybe Elem2) Elem3])
(s/validate Elem [{:k "huji"}])
;; =>
;; [(maybe {:n Str}) {:k java.lang.String}] is not a valid sequence
;; schema; a valid sequence schema consists of zero or more `one`
;; elements, followed by zero or more `optional` elements, followed by
;; an optional schema that will match the remaining elements.
(s/defrecord ElemOption1
[elem3 :- Elem3])
(s/defrecord ElemOption2
[elem2 :- Elem2
elem3 :- Elem3])
(s/def Elem
(s/conditional
#(= 2 (count %)) ElemOption2
:else ElemOption1))
(s/validate Elem [{:k "huji"}])
;; =>
;; Value does not match schema: (not (instance?
;; peg_dsl.standard_app.ElemOption1 [{:k "huji"}]))
The main problem is that I don't understand what is the way to write
schema which allows to omit first element of specified vector.
What is the correct way to match both the vectors from above?
The problem with your first attempt is that starting with an
optional means it expects {:k s/Str} or nothing, and it's seeing
{:n s/Str}, so that's clearly not right.
Your second attempt has two problems. Maybe can be the value
or nil, but it needs to be present. You're also not writing the
sequence schema correctly. But the problem with a sequence schema
is the elements need to be in the order s/one* s/optional*, and you
want s/optional s/one.
Your third attempt is closer, using the conditional, but you're
failing to match because you're not validating instances of the
records, you're validating maps.
A solution looks like this:
(def ElemKNList [(s/one {:k s/Str} "k") (s/one {:n s/Str} "n")])
(def ElemNList [(s/one {:n s/Str} "n")])
(def Elem (s/conditional #(= 2 (count %)) ElemKNList
:else ElemNList))
(s/validate Elem [{:k "huji"} {:n "huji"}])
=> [{:k "huji"} {:n "huji"}]
(s/validate Elem [{:n "huji"}])
=> [{:n "huji"}]
(s/validate Elem [{:k "huji"}])
=> ExceptionInfo Value does not match schema: [(named {:n missing-required-key, :k disallowed-key} "n")] schema.core/validator/fn--18435 (core.clj:151)
Related
I need a predicate which returns logically true if the given value is a not-empty collection and logically false if it's anything else (number, string etc.).
And more specifically, that the predicate won't throw the IllegalArgumentException if applied to single number, or string.
I came up with the following function, but I'm wondering if there is some more idiomatic approach?
(defn not-empty-coll? [x]
(and (coll? x) (seq x)))
This will satisfy following tests:
(is (not (not-empty-coll? nil))) ;; -> false
(is (not (not-empty-coll? 1))) ;; -> false
(is (not (not-empty-coll? "foo"))) ;; -> false
(is (not (not-empty-coll? []))) ;; -> nil (false)
(is (not (not-empty-coll? '()))) ;; -> nil (false)
(is (not (not-empty-coll? {}))) ;; -> nil (false)
(is (not-empty-coll? [1])) ;; -> (1) (true)
(is (not-empty-coll? '(1))) ;; -> (1) (true)
(is (not-empty-coll? {:a 1})) ;; -> ([:a 1]) (true)
EDIT: A potential use case:
Let's say we need to process some raw external data which are not (yet) under our control. Input could be for example a collection which contains either primitive values, or nested collections. Other example could be a collection holding some inconsistent (maybe broken?) tree structure. So, we can consider mentioned predicate as first line data cleaning.
Otherwise, I agree with comments that is better to explicitly separate and process collection and non-collection data.
How about using Clojure protocols and type extensions to solve this?
(defprotocol EmptyCollPred
(not-empty-coll? [this]))
(extend-protocol EmptyCollPred
Object
(not-empty-coll? [this] false)
nil
(not-empty-coll? [this] false)
clojure.lang.Seqable
(not-empty-coll? [this] (not (empty? (seq this)))))
(is (not (not-empty-coll? nil))) ;; -> false
(is (not (not-empty-coll? 1))) ;; -> false
(is (not (not-empty-coll? "foo"))) ;; -> false
(is (not (not-empty-coll? []))) ;; -> nil (false)
(is (not (not-empty-coll? '()))) ;; -> nil (false)
(is (not (not-empty-coll? {}))) ;; -> nil (false)
(is (not-empty-coll? [1])) ;; -> (1) (true)
(is (not-empty-coll? '(1))) ;; -> (1) (true)
(is (not-empty-coll? {:a 1})) ;; -> ([:a 1]) (true)
Maybe it would be cleaner to extend just String and Number instead of Object - depends on what do you know about the incoming data. Also, it would be probably better to filter out nils beforehand instead of creating a case for it as you see above.
Another - conceptually similar - solution could use multimethods.
As suggested in the comments, I would consider calling not-empty? with a non-collection argument to be an invalid usage, which should generate an IllegalArgumentException.
There is already a function not-empty? available for use in the Tupelo library. Here are the unit tests:
(deftest t-not-empty
(is (every? not-empty? ["one" [1] '(1) {:1 1} #{1} ] ))
(is (has-none? not-empty? [ "" [ ] '( ) {} #{ } nil] ))
(is= (map not-empty? ["1" [1] '(1) {:1 1} #{1} ] )
[true true true true true] )
(is= (map not-empty? ["" [] '() {} #{} nil] )
[false false false false false false ] )
(is= (keep-if not-empty? ["1" [1] '(1) {:1 1} #{1} ] )
["1" [1] '(1) {:1 1} #{1} ] )
(is= (drop-if not-empty? ["" [] '() {} #{} nil] )
["" [] '() {} #{} nil] )
(throws? IllegalArgumentException (not-empty? 5))
(throws? IllegalArgumentException (not-empty? 3.14)))
Update
The preferred approach would be for a function to only receive collection parameters in a given argument, not a mixture scalar & collection arguments. Then, one only needs not-empty given the pre-knowledge that the value in question is not a scalar. I often use Plumatic Schema to enforce this assumption and catch any errors in the calling code:
(ns xyz
(:require [schema.core :as s] )) ; plumatic schema
(s/defn foo :- [s/Any]
"Will do bar to the supplied collection"
[coll :- [s/Any]]
(if (not-empty coll)
(mapv bar foo)
[ :some :default :value ] ))
The 2 uses of notation :- [s/Any] checks that the arg & return value are both declared to be a sequential collection (list or vector). Each element is unrestricted by the s/Any part.
If you can't enforce the above strategy for some reason, I would just modify your first approach as follows:
(defn not-empty-coll? [x]
(and (coll? x) (t/not-empty? x)))
I'm hoping you know at least a little about the param x so the question becomes: Is x a scalar or a non-empty vector. Then you could say something like:
(defn not-empty-coll? [x]
(and (sequential? x) (t/not-empty? x)))
I have a list of strings, fx '("abc" "def" "gih") and i would like to be able to search the list for any items containing fx "ef" and get the item or index returned.
How is this done?
Combining filter and re-find can do this nicely.
user> (def fx '("abc" "def" "gih"))
#'user/fx
user> (filter (partial re-find #"ef") fx)
("def")
user> (filter (partial re-find #"a") fx)
("abc")
In this case I like to combine them with partial though defining an anonymous function works fine in that case as well. It is also useful to use re-pattern if you don't know the search string in advance:
user> (filter (partial re-find (re-pattern "a")) fx)
("abc")
If you want to retrieve all the indexes of the matching positions along with the element you can try this:
(filter #(re-find #"ef" (second %)) (map-indexed vector '("abc" "def" "gih")))
=>([1 "def"])
map-indexed vector generates an index/value lazy sequence
user> (map-indexed vector '("abc" "def" "gih"))
([0 "abc"] [1 "def"] [2 "gih"])
Which you can then filter using a regular expression against the second element of each list member.
#(re-find #"ef" (second %))
Just indices:
Lazily:
(keep-indexed #(if (re-find #"ef" %2)
%1) '("abc" "def" "gih"))
=> (1)
Using loop/recur
(loop [[str & strs] '("abc" "def" "gih")
idx 0
acc []]
(if str
(recur strs
(inc idx)
(cond-> acc
(re-find #"ef" str) (conj idx)))
acc))
For just the element, refer to Arthur Ulfeldts answer.
Here is a traditional recursive definition that returns the index. It's easy to modify to return the corresponding string as well.
(defn strs-index [re lis]
(let [f (fn [ls n]
(cond
(empty? ls) nil
(re-find re (first ls)) n
:else (recur (rest ls) (inc n))))]
(f lis 0)))
user=> (strs-index #"de" ["abc" "def" "gih"])
1
user=> (strs-index #"ih" ["abc" "def" "gih"])
2
user=> (strs-index #"xy" ["abc" "def" "gih"])
nil
(Explanation: The helper function f is defined as a binding in let, and then is called at the end. If the sequence of strings passed to it is not empty, it searches for the regular expression in the first element of the sequence and returns the index if it finds the string. This uses the fact that re-find's result counts as true unless it fails, in which case it returns nil. If the previous steps don't succeed, the function starts over with the rest of the sequence and an incremented index. If it gets to the end of the sequence, it returns nil.)
What is the idiomatic way of counting certain properties of a nested map of maps in Clojure?
Given the following datastructure:
(def x {
:0 {:attrs {:attributes {:dontcare "something"
:1 {:attrs {:abc "some value"}}}}}
:1 {:attrs {:attributes {:dontcare "something"
:1 {:attrs {:abc "some value"}}}}}
:9 {:attrs {:attributes {:dontcare "something"
:5 {:attrs {:xyz "some value"}}}}}})
How can i produce the desired output:
(= (count-attributes x) {:abc 2, :xyz 1})
This is my best effort so far:
(defn count-attributes
[input]
(let [result (for [[_ {{attributes :attributes} :attrs}] x
:let [v (into {} (remove (comp not :attrs) (vals attributes)))]]
(:attrs v))]
(frequencies result)))
Which produces the following:
{{:abc "some value"} 2, {:xyz "some value"} 1}
I like building such functions with threadding so the steps are easier to read
user> (->> x
vals ; first throw out the keys
(map #(get-in % [:attrs :attributes])) ; get the nested maps
(map vals) ; again throw out the keys
(map #(filter map? %)) ; throw out the "something" ones.
flatten ; we no longer need the sequence sequences
(map vals) ; and again we don't care about the keys
flatten ; the map put them back into a list of lists
frequencies) ; and then count them.
{{:abc "some value"} 2, {:xyz "some value"} 1}
(remove (comp not :attrs) is a lot like select-keys
for [[_ {{attributes :attributes} :attrs}] reminds me of get-in
I find tree-seq very useful for these cases:
(frequencies (filter #(and (map? %) (not-any? map? (vals %))) (tree-seq map? vals x)))
I have 2 bindings I'm calling path and callback.
What I am trying to do is to return the first non-empty one. In javascript it would look like this:
var final = path || callback || "";
How do I do this in clojure?
I was looking at the "some" function but I can't figure out how to combine the compjure.string/blank check in it. I currently have this as a test, which doesn't work. In this case, it should return nil I think.
(some (clojure.string/blank?) ["1" "2" "3"])
In this case, it should return 2
(some (clojure.string/blank?) ["" "2" "3"])
(first (filter (complement clojure.string/blank?) ["" "a" "b"]))
Edit: As pointed out in the comments, (filter (complement p) ...) can be rewritten as (remove p ...):
(first (remove clojure.string/blank? ["" "a" "b"]))
If you are so lucky to have "empty values" represented by nil and/or false you could use:
(or nil false "2" "3")
Which would return "2".
An equivalent to your JavaScript example would be:
(let [final (or path callback "")]
(println final))
If you want the first non blank string of a sequence you can use something like this:
(first (filter #(not (clojure.string/blank? %)) ["" "2" "3"]))
This will return 2
What i don't understand is your first example using the some function, you said that it should return nil but the first non blank string is "1".
This is how you would use the some function:
(some #(when-not (empty? %) %) ["" "foo" ""])
"foo"
(some #(when-not (empty? %) %) ["bar" "foo" ""])
"bar"
As others have pointed out, filter is another option:
(first (filter #(not (empty? %)) ["" "" "foo"])
"foo"
A third option would be to use recursion:
(defn first-non-empty [& x]
(let [[y & z] x]
(if (not (empty? y))
y
(when z (recur z)))))
(first-non-empty "" "bar" "")
"bar"
(first-non-empty "" "" "foo")
"foo"
(first-non-empty "" "" "")
nil
I used empty? instead of blank? to save on typing, but the only difference should be how whitespace is handled.
It was difficult for me to tell exactly what you wanted, so this is my understanding of what you are trying to do.
In my case, I wanted to find if an item in one report was missing in a second report. A match returned nil, and a non-match returned the actual item that did not match.
The following functions wind up comparing the value of a mapped value with a key.
Using something like find-first is probably what you want to do.
(defn find-first
"This is a helper function that uses filter, a comparision value, and
stops comparing once the first match is found. The actual match
is returned, and nil is returned if comparision value is not matched."
[pred col]
(first (filter pred col)))
(defn str-cmp
"Takes two strings and compares them. Returns 0 if a match; and nil if not."
[str-1 str-2 cmp-start-pos substr-len]
(let [computed-str-len (ret-lowest-str-len str-1 str-2 substr-len)
rc-1 (subs str-1 cmp-start-pos computed-str-len)
rc-2 (subs str-2 cmp-start-pos computed-str-len)]
(if (= 0 (compare rc-1 rc-2))
0
nil)))
(defn cmp-one-val
"Return nil if first key match found,
else the original comparision row is returned.
cmp-row is a single sequence of data from a map. i
cmp-key is the key to extract the comparision value.
cmp-seq-vals contain a sequence derived from
one key in a sequence of maps.
cmp-start and substr-len are start and stop
comparision indicies for str-cmp."
[cmp-row cmp-key cmp-seq-vals cmp-start substr-len]
(if (find-first #(str-cmp (cmp-key cmp-row) %1 cmp-start substr-len) cmp-seq-vals)
nil
cmp-row))
I'm have a collection of prefix/value pairs, and wish to find any value in this connection associated with a prefix that my current target string begins with. (It is not important that behavior be defined in the case where more than one prefix matches, as the nature of my use case is such that this should never occur).
A naive (working) implementation follows:
(defn prefix-match [target-str pairs]
(some
(fn [[k v]]
(if (.startsWith target-str k)
v
false))
pairs))
Such that:
user=> (prefix-match "foobar" {"meh" :qux, "foo" :baz})
:baz
This works as intended, but is O(n) with the length of the pairs sequence. (Fast insertion into pairs is also desirable, but not as important as fast lookup).
The first thing that comes to mind is bisecting a sorted collection with efficient random access, but I'm not sure which data structures in Clojure are most appropriate to the task. Suggestions?
How about a trie?
(defn build-trie [seed & kvs]
(reduce
(fn [trie [k v]]
(assoc-in trie (concat k [:val]) v))
seed
(partition 2 kvs)))
(defn prefix-match [target trie]
(when (seq target)
(when-let [node (trie (first target))]
(or (:val node)
(recur (rest target) node)))))
Usage:
user> (def trie (build-trie {} "foo" :baz "meh" :qux))
#'user/trie
user> trie
{\m {\e {\h {:val :qux}}}, \f {\o {\o {:val :baz}}}}
user> (prefix-match "foobar" trie)
:baz
user> (prefix-match "foo" trie)
:baz
user> (prefix-match "f" trie)
nil
user> (prefix-match "abcd" trie)
nil
An efficient, terse approach is to take advantage of rsubseq, which works on any type implementing clojure.lang.Sorted -- which includes sorted-map.
(defn prefix-match [sorted-map target]
(let [[closest-match value] (first (rsubseq sorted-map <= target))]
(if closest-match
(if (.startsWith target closest-match)
value
nil)
nil)))
This passes the relevant tests in my suite:
(deftest prefix-match-success
(testing "prefix-match returns a successful match"
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foobar") :one)
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foo") :one)))
(deftest prefix-match-fail
(testing "prefix-match returns nil on no match"
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "bazqux")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "zzz")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "aaa")))))
It seems simplest to just turn the list of prefixes into a regular expression, and feed those into a regex matcher, which is optimized for exactly this sort of task. Something like
(java.util.regex.Pattern/compile (str "^"
"(?:"
(clojure.string/join "|"
(map #(java.util.regex.Pattern/quote %)
prefixes))
")"))
Should get you a regex suitable for testing against a string (but I haven't tested it at all, so maybe I got some method names wrong or something).
The following solution finds the longest matching prefix and works surprisingly well when the map is huge and strings are relatively short. It tries to match e.g. "foobar", "fooba", "foob", "foo", "fo", "f" in order and returns the first match.
(defn prefix-match
[s m]
(->> (for [end (range (count s) 0 -1)] (.subSequence s 0 end)) ; "foo", "fo", "f"
(map m) ; match "foo", match "fo", ...
(remove nil?) ; ignore unmatched
(first))) ; Take first and longest match