I have complex Spec for my data - how to generate samples? - clojure

My Clojure spec looks like :
(spec/def ::global-id string?)
(spec/def ::part-of string?)
(spec/def ::type string?)
(spec/def ::value string?)
(spec/def ::name string?)
(spec/def ::text string?)
(spec/def ::date (spec/nilable (spec/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
(spec/def ::interaction-name string?)
(spec/def ::center (spec/coll-of string? :kind vector? :count 2))
(spec/def ::context- (spec/keys :req [::global-id ::type]
:opt [::part-of ::center]))
(spec/def ::contexts (spec/coll-of ::context-))
(spec/def ::datasource string?)
(spec/def ::datasource- (spec/nilable (spec/keys :req [::global-id ::name])))
(spec/def ::datasources (spec/coll-of ::datasource-))
(spec/def ::location string?)
(spec/def ::location-meaning- (spec/keys :req [::global-id ::location ::contexts ::type]))
(spec/def ::location-meanings (spec/coll-of ::location-meaning-))
(spec/def ::context string?)
(spec/def ::context-association-type string?)
(spec/def ::context-association-name string?)
(spec/def ::priority string?)
(spec/def ::has-context- (spec/keys :req [::context ::context-association-type ::context-association-name ::priority]))
(spec/def ::has-contexts (spec/coll-of ::has-context-))
(spec/def ::fact- (spec/keys :req [::global-id ::type ::name ::value]))
(spec/def ::facts (spec/coll-of ::fact-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::fulltext (spec/keys :req [::global-id ::text]))
(spec/def ::feature- (spec/keys :req [::global-id ::date ::location-meanings ::has-contexts ::facts ::attributes ::interaction-name]
:opt [::fulltext]))
(spec/def ::features (spec/coll-of ::feature-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::ioi-slice string?)
(spec/def ::ioi- (spec/keys :req [::global-id ::type ::datasource ::features ::attributes ::ioi-slice]))
(spec/def ::iois (spec/coll-of ::ioi-))
(spec/def ::data (spec/keys :req [::contexts ::datasources ::iois]))
(spec/def ::data- ::data)
But it fails to generate samples with:
(spec/fdef data->graph
:args (spec/cat :data ::xml-spec/data-))
(println (stest/check `data->graph))
then it will fail to generate with an exception:
Couldn't satisfy such-that predicate after 100 tries.
It is very convenient to generate spec automatically with stest/check but how to beside spec also have generators?

When you see the error Couldn't satisfy such-that predicate after 100 tries. when generating data from specs, a common cause is an s/and spec because spec builds generators for s/and specs based solely on the first inner spec.
This spec seemed most likely to cause this, because the first inner spec/predicate in the s/and is string?, and the following predicate is a regex:
(s/def ::date (s/nilable (s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
If you sample a string? generator, you'll see what it produces is unlikely to ever match your regex:
(gen/sample (s/gen string?))
=> ("" "" "X" "" "" "hT9" "7x97" "S" "9" "1Z")
test.check will try (100 times by default) to get a value that satisfies such-that conditions, then throw the exception you're seeing if it doesn't.
Generating Dates
You can implement a custom generator for this spec in several ways. Here's a test.check generator that will create ISO local date strings:
(def gen-local-date-str
(let [day-range (.range (ChronoField/EPOCH_DAY))
day-min (.getMinimum day-range)
day-max (.getMaximum day-range)]
(gen/fmap #(str (LocalDate/ofEpochDay %))
(gen/large-integer* {:min day-min :max day-max}))))
This approach gets the range of valid epoch days, uses that to control the range of large-integer* generator, then fmaps LocalDate/ofEpochDay over the generated integers.
(def gen-local-date-str
(gen/fmap #(-> (Instant/ofEpochMilli %)
(LocalDateTime/ofInstant ZoneOffset/UTC)
(.toLocalDate)
(str))
gen/large-integer))
This starts with the default large-integer generator and uses fmap to provide a function that creates a java.time.Instant from the generated integer, converts it to a java.time.LocalDate, and converts that to a string which happens to conveniently match your date string format. (This is slightly simpler on Java 9 and above with java.time.LocalDate/ofInstant.)
Another approach might use test.chuck's regex-based string generator, or different date classes/formatters. Note that both of my examples will generate years that are eons before/after -9999/+9999, which won't match your \d{4} year regex, but the generator should produce satisfactory values often enough that it may not matter for your use case. There are many ways to generate date values!
(gen/sample gen-local-date-str)
=>
("1969-12-31"
"1970-01-01"
"1970-01-01"
...)
Using Custom Generators with Specs
Then you can associate this generator with your spec using s/with-gen:
(s/def ::date
(s/nilable
(s/with-gen
(s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))
(constantly gen-local-date-str))))
(gen/sample (s/gen ::date))
=>
("1969-12-31"
nil ;; note that it also makes nils b/c it's wrapped in s/nilable
"1970-01-01"
...)
You can also provide "standalone" custom generators to certain spec functions that take an overrides map, if you don't want to tie the custom generator directly to the spec definition:
(gen/sample (s/gen ::data {::date (constantly gen-local-date-str)}))
Using this spec and generator I was able to generate your larger ::data spec, although the outputs were very large due to some of the collection specs. You can also control the size of those during generation using :gen-max options in the specs.

Related

Clojure.Spec derive or alias another spec

I'd like to use clojure spec to build up a set of type constraints that can be aliased or further constrained by other specs.
For example, I might have many fields that all need to be valid sanitized markdown.
The following example works for validation (s/valid?) but not for generation (gen/generate)
(s/def ::sanitized-markdown string?)
(s/def ::instruction-list #(s/valid? ::sanitized-markdown %)) ;; works
(gen/generate (s/gen ::instruction-list)) ;; fails
However (gen/generate (s/gen ::sanitized-markdown)) does work.
Is there a way to extend ::instruction-list from ::sanitized-markdown so that it preserves all behavior?
You can alias another spec by providing it directly to s/def:
(s/def ::instruction-list ::sanitized-markdown)
You can use s/merge when merging map specs and s/and in other cases.
(s/def ::sanitized-markdown string?)
(s/def ::instruction-list (s/and ::sanitized-markdown #(> (count %) 10)))
(s/valid? ::instruction-list "abcd")
;; false
(s/valid? ::instruction-list "abcdefghijkl")
;; true
(gen/generate (s/gen ::instruction-list))
;; "178wzJW3W3zx2G0GJ1931eEeO"
An example with maps
(s/def ::a string?)
(s/def ::b string?)
(s/def ::c string?)
(s/def ::d string?)
(s/def ::first-map (s/keys :opt [::a ::b]))
(s/def ::second-map (s/keys :opt [::c ::d]))
(s/def ::third-map (s/merge ::first-map ::second-map))
(s/valid? ::third-map {:a "1" :d "2"})
;; true
(gen/generate (s/gen ::third-map))
;; {::b "gvQ7DI1kQ9DxG7C4poeWhk553", ::d "9KIp77974TEqs9HCq", ::c "qeSZA8NcYr7UVpJDsA17K"}

Why is Clojure Spec going into an infinite loop here?

This is an application that represents visual patterns as a collection of Sshapes.
An Sshape (styled shape) is a list of points and a map of style information.
An APattern is a record containing a list of Sshapes.
Here's the spec :
In sshape.clj
(spec/def ::stroke-weight int?)
(spec/def ::color (spec/* int?))
(spec/def ::stroke ::color)
(spec/def ::fill ::color)
(spec/def ::hidden boolean?)
(spec/def ::bezier boolean?)
(spec/def ::style (spec/keys :opt-un [::stroke-weight ::stroke ::fill ::hidden ::bezier]))
(spec/def ::point (spec/* number?))
(spec/def ::points (spec/* ::point))
(spec/def ::SShape (spec/keys :req-un [::style ::points]))
In groups.clj
(spec/def ::sshapes (spec/* :patterning.sshapes/SShape))
(spec/def ::APattern (spec/keys :req-un [::sshapes]))
Then in another file, I try to test that a superimpose function that puts two APatterns together is accepting APatterns
(defn superimpose-layout "simplest layout, two patterns located on top of each other "
[pat1 pat2]
{:pre [(spec/valid? :patterning.groups/APattern pat1)]}
(->APattern (concat (:sshapes pat1) (:sshapes pat2))) )
Without the pre-condition this runs.
With the pre-condition, I get this infinite recursion and stack overflow.
Exception in thread "main" java.lang.StackOverflowError, compiling:(/tmp/form-init7774655152686087762.clj:1:73)
at clojure.lang.Compiler.load(Compiler.java:7526)
at clojure.lang.Compiler.loadFile(Compiler.java:7452)
at clojure.main$load_script.invokeStatic(main.clj:278)
at clojure.main$init_opt.invokeStatic(main.clj:280)
at clojure.main$init_opt.invoke(main.clj:280)
at clojure.main$initialize.invokeStatic(main.clj:311)
at clojure.main$null_opt.invokeStatic(main.clj:345)
at clojure.main$null_opt.invoke(main.clj:342)
at clojure.main$main.invokeStatic(main.clj:424)
at clojure.main$main.doInvoke(main.clj:387)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.lang.Var.applyTo(Var.java:702)
at clojure.main.main(main.java:37)
Caused by: java.lang.StackOverflowError
at clojure.spec.alpha$regex_QMARK_.invokeStatic(alpha.clj:81)
at clojure.spec.alpha$regex_QMARK_.invoke(alpha.clj:78)
at clojure.spec.alpha$maybe_spec.invokeStatic(alpha.clj:108)
at clojure.spec.alpha$maybe_spec.invoke(alpha.clj:103)
at clojure.spec.alpha$the_spec.invokeStatic(alpha.clj:117)
at clojure.spec.alpha$the_spec.invoke(alpha.clj:114)
at clojure.spec.alpha$dt.invokeStatic(alpha.clj:742)
at clojure.spec.alpha$dt.invoke(alpha.clj:738)
at clojure.spec.alpha$dt.invokeStatic(alpha.clj:739)
at clojure.spec.alpha$dt.invoke(alpha.clj:738)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1480)
at clojure.spec.alpha$deriv.invoke(alpha.clj:1474)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1491)
at clojure.spec.alpha$deriv.invoke(alpha.clj:1474)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1491)
at clojure.spec.alpha$deriv.invoke(alpha.clj:1474)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1492)
at clojure.spec.alpha$deriv.invoke(alpha.clj:1474)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1492)
at clojure.spec.alpha$deriv.invoke(alpha.clj:1474)
at clojure.spec.alpha$deriv.invokeStatic(alpha.clj:1492)
etc.
Update :
OK. I've narrowed this down a bit in the repl.
Let's say a vector of points is defined so that pts is
[[-0.3 -3.6739403974420595E-17] [1.3113417037298127E-8 -0.2999999999999997] [0.2999999999999989 2.6226834037856828E-8] [-3.934025103841547E-8 0.29999999999999744] [-0.3 -3.6739403974420595E-17]]
Then calling
(spec/valid? :patterning.sshapes/points pts)
gives me the stack overflow :
StackOverflowError clojure.spec.alpha/regex? (alpha.clj:81)
So it looks like it just because I'm trying to match a spec/* of a spec/* of numbers.
Is there some reason that nested vectors trigger this kind of infinite recursion?
You should probably use spec/coll-of instead of s/* for this purpose:
(s/def ::point (s/coll-of number?))
(s/def ::points (s/coll-of ::point))
(s/def ::SShape (s/keys :req-un [::style ::points]))
(s/exercise (s/coll-of ::SShape))
;; => ([[] []] [[{:style {:hidden false, :bezier false}, :points [[1.0 -3.0 0 0.75 -1.0 -1.0 0 -1.5 1.0 3.0 -1 0] [-2.0 -1 2.0 2.0 0 ...
There are a couple of bugs in Clojure spec in this area, I believe.
This one looks like an instance of https://dev.clojure.org/jira/browse/CLJ-2002. It is triggered on conform:
(s/conform (s/* (s/* number?)) [[]]) ; => StackOverflowError

Clojure using value from another required key in validation

I'm relatively new to clojure and I'm looking for a way to use the value of one required key in the validation of another. I can do it by creating another map with the two values and passing that, but I was hoping there was a simpler way. Thanks
(s/def ::country string?)
(s/def ::postal-code
;sudo-code
;(if (= ::country "Canda")
;(re-matches #"^[A-Z0-9]{5}$")
;(re-matches #"^[0-9]{5}$"))
)
(s/def ::address
(s/keys :req-un [
::country
::postal-code
::street
::state
]))
Here's a way to do it with multi-spec:
(defmulti country :country)
(defmethod country "Canada" [_]
(s/spec #(re-matches #"^[A-Z0-9]{5}$" (:postal-code %))))
(defmethod country :default [_]
(s/spec #(re-matches #"^[0-9]{5}$" (:postal-code %))))
(s/def ::country string?)
(s/def ::postal-code string?)
(s/def ::address
(s/merge
(s/keys :req-un [::country ::postal-code])
(s/multi-spec country :country)))
(s/explain ::address {:country "USA" :postal-code "A2345"})
;; val: {:country "USA", :postal-code "A2345"} fails spec: :sandbox.so/address at: ["USA"] predicate: (re-matches #"^[0-9]{5}$" (:postal-code %))
(s/explain ::address {:country "Canada" :postal-code "A2345"})
;; Success!
Another option is and-ing another predicate on your keys spec:
(s/def ::address
(s/and
(s/keys :req-un [::country ::postal-code])
#(case (:country %)
"Canada" (re-matches #"^[A-Z0-9]{5}$" (:postal-code %))
(re-matches #"^[0-9]{5}$" (:postal-code %)))))
You might prefer the multi-spec approach because it's open for extension i.e. you can define more defmethods for country later as opposed to keeping all the logic in the and predicate.

Realistic Clojure Spec for function with named arguments

Say that we have a function clothe which requires one positional argument person in addition to a number of optional named arguments :hat, :shirt and :pants.
(defn clothe [person & {:keys [hat shirt pants]}]
(str "Clothing " person " with " hat shirt pants "."))
(clothe 'me :hat "top hat")
=> "Clothing me with top hat."
My current way of writing a spec for this function would be:
(require '[clojure.spec :as spec]
'[clojure.spec.gen :as gen])
(spec/def ::person symbol?)
(spec/def ::clothing
(spec/alt :hat (spec/cat :key #{:hat} :value string?)
:shirt (spec/cat :key #{:shirt} :value string?)
:pants (spec/cat :key #{:pants} :value string?)))
(spec/fdef clothe
:args (spec/cat :person ::person
:clothes (spec/* ::clothing))
:ret string?)
The problem then being that it allows for argument lists like
(clothe 'me :hat "top hat" :hat "nice hat")
=> "Clothing me with nice hat."
which even though allowed by the language itself probably is a mistake whenever made. But perhaps worse than that is that it makes the generated data unrealistic to how the function is usually called:
(gen/generate (spec/gen (spec/cat :person ::person
:clothes (spec/* ::clothing))))
=> (_+_6+h/!-6Gg9!43*e :hat "m6vQmoR72CXc6R3GP2hcdB5a0"
:hat "05G5884aBLc80s4AF5X9V84u4RW" :pants "3Q" :pants "a0v329r25f3k5oJ4UZJJQa5"
:hat "C5h2HW34LG732ifPQDieH" :pants "4aeBas8uWx1eQWYpLRezBIR" :hat "C229mzw"
:shirt "Hgw3EgUZKF7c7ya6q2fqW249GsB" :pants "byG23H2XyMTx0P7v5Ve9qBs"
:shirt "5wPMjn1F2X84lU7X3CtfalPknQ5" :pants "0M5TBgHQ4lR489J55atm11F3"
:shirt "FKn5vMjoIayO" :shirt "2N9xKcIbh66" :hat "K8xSFeydF" :hat "sQY4iUPF0Ef58198270DOf"
:hat "gHGEqi58A4pH2s74t0" :pants "" :hat "D6RKWJJoFLCAaHId8AF4" :pants "exab2w5o88b"
:hat "S7Ti2Cb1f7se7o86I1uE" :shirt "9g3K6q1" :hat "slKjK67608Y9w1sqV1Kxm"
:hat "cFbVMaq8bfP22P8cD678s" :hat "f57" :hat "2W83oa0WVWM10y1U49265k2bJx"
:hat "O6" :shirt "7BUJ824efBb81RL99zBrvH2HjziIT")
And worse of all, if you happen to have a recursive defenition with spec/* there is no way of limiting the number of potentially recursive occurences generated when running tests on the code.
So then my question becomes: Is there a way to specify named arguments to a function limiting the number of occurences per key to one?
If we look at the way the require macro is specced in clojure.core.specs we can see that it uses (spec/keys* :opt-un []) to specify the named arguments in the dependency list, such as :refer and :as in (ns (:require [a.b :as b :refer :all])).
(s/def ::or (s/map-of simple-symbol? any?))
(s/def ::as ::local-name)
(s/def ::prefix-list
(s/spec
(s/cat :prefix simple-symbol?
:suffix (s/* (s/alt :lib simple-symbol? :prefix-list ::prefix-list))
:refer (s/keys* :opt-un [::as ::refer]))))
(s/def ::ns-require
(s/spec (s/cat :clause #{:require}
:libs (s/* (s/alt :lib simple-symbol?
:prefix-list ::prefix-list
:flag #{:reload :reload-all :verbose})))))
The documentation doesn't mention what :req-un and :opt-un are for, but the Spec Guide on the other hand mentions that they are for specifying unqualified keys. Returning to our function defenition we could write it as:
(spec/def ::clothing (spec/keys* :opt-un [::hat ::shirt ::pants]))
(spec/def ::hat string?)
(spec/def ::shirt string?)
(spec/def ::pants string?)
(spec/fdef clothe
:args (spec/cat :person ::person
:clothes ::clothing)
:ret string?)
Sadly this doesn't help with the function accepting multiple instances of the same named argument
(stest/instrument `clothe)
(clothe 'me :hat "top hat" :hat "nice hat")
=> "Clothing me with nice hat."
Though it does mean that the generator maximally produces one instance of the same key which does help with the recursive specs.
(gen/generate (spec/gen (spec/cat :person ::person
:clothes ::clothing)))
=> (u_K_P6!!?4Ok!_I.-.d!2_.T-0.!+H+/At.7R8z*6?QB+921A
:shirt "B4W86P637c6KAK1rv04O4FRn6S" :pants "3gdkiY" :hat "20o77")

How can I spec a hybrid map?

After writing this answer, I was inspired to try to specify Clojure's destructuring language using spec:
(require '[clojure.spec :as s])
(s/def ::binding (s/or :sym ::sym :assoc ::assoc :seq ::seq))
(s/def ::sym (s/and simple-symbol? (complement #{'&})))
The sequential destructuring part is easy to spec with a regex (so I'm ignoring it here), but I got stuck at associative destructuring. The most basic case is a map from binding forms to key expressions:
(s/def ::mappings (s/map-of ::binding ::s/any :conform-keys true))
But Clojure provides several special keys as well:
(s/def ::as ::sym)
(s/def ::or ::mappings)
(s/def ::ident-vec (s/coll-of ident? :kind vector?))
(s/def ::keys ::ident-vec)
(s/def ::strs ::ident-vec)
(s/def ::syms ::ident-vec)
(s/def ::opts (s/keys :opt-un [::as ::or ::keys ::strs ::syms]))
How can I create an ::assoc spec for maps that could be created by merging together a map that conforms to ::mappings and a map that conforms to ::opts? I know that there's merge:
(s/def ::assoc (s/merge ::opts ::mappings))
But this doesn't work, because merge is basically an analogue of and. I'm looking for something that's analogous to or, but for maps.
You can spec hybrid maps using an s/merge of s/keys and s/every of the map as tuples. Here's a simpler example:
(s/def ::a keyword?)
(s/def ::b string?)
(s/def ::m
(s/merge (s/keys :opt-un [::a ::b])
(s/every (s/or :int (s/tuple int? int?)
:option (s/tuple keyword? any?))
:into {})))
(s/valid? ::m {1 2, 3 4, :a :foo, :b "abc"}) ;; true
This simpler formulation has several benefits over a conformer approach. Most importantly, it states the truth. Additionally, it should generate, conform, and unform without further effort.
You can use s/conformer as an intermediate step in s/and to transform your map to the form that’s easy to validate:
(s/def ::assoc
(s/and
map?
(s/conformer #(array-map
::mappings (dissoc % :as :or :keys :strs :syms)
::opts (select-keys % [:as :or :keys :strs :syms])))
(s/keys :opt [::mappings ::opts])))
That will get you from e.g.
{ key :key
:as name }
to
{ ::mappings { key :key }
::opts { :as name } }