create spec from data - clojure

I am trying to create spec just from data. I have very complex data structure - all nested map.
{:contexts
({:importer.datamodel/global-id "01b4e69f86e5dd1d816e91da27edc08e",
:importer.datamodel/type "province",
:name "a1",
:importer.datamodel/part-of "8cda1baed04b668a167d4ca28e3cef36"}
{:importer.datamodel/global-id "8cda1baed04b668a167d4ca28e3cef36",
:importer.datamodel/type "country",
:name "AAA"}
{:importer.datamodel/global-id "c78e5478e19f2d7c1b02088e53e8d8a4",
:importer.datamodel/type "location",
:importer.datamodel/center ["36." "2."],
:importer.datamodel/part-of "01b4e69f86e5dd1d816e91da27edc08e"}
{:importer.datamodel/global-id "88844f94f79c75acfcb957bb41386149",
:importer.datamodel/type "organisation",
:name "C"}
{:importer.datamodel/global-id "102e96468e5d13058ab85c734aa4a949",
:importer.datamodel/type "organisation",
:name "A"}),
:datasources
({:importer.datamodel/global-id "Source;ACLED",
:name "ACLED",
:url "https://www.acleddata.com"}),
:iois
({:importer.datamodel/global-id "item-set;ACLED",
:importer.datamodel/type "event",
:datasource "Source;ACLED",
:features
({:importer.datamodel/global-id
"c74257292f584502f9be02c98829d9fda532a492e7dd41e06c31bbccc76a7ba0",
:date "1997-01-04",
:fulltext
{:importer.datamodel/global-id "df5c7d6d075df3a7719ebdd39c6d4c7f",
:text "bla"},
:location-meanings
({:importer.datamodel/global-id
"e5611219971164a15f06e07228fb7b51",
:location "8cda1baed04b668a167d4ca28e3cef36",
:contexts (),
:importer.datamodel/type "position"}
{:importer.datamodel/global-id
"af36461d27ec1d8d28fd7f4a70ab7ce2",
:location "c78e5478e19f2d7c1b02088e53e8d8a4",
:contexts (),
:importer.datamodel/type "position"}),
:interaction-name "Violence",
:importer.datamodel/type "description",
:has-contexts
({:context "102e96468e5d13058ab85c734aa4a949",
:context-association-type "actor",
:context-association-name "actor-1",
:priority "none"}
{:context "88844f94f79c75acfcb957bb41386149",
:context-association-type "actor",
:context-association-name "actor-2",
:priority "none"}),
:facts
({:importer.datamodel/global-id
"c46802ce6dcf33ca02ce113ffd9a855e",
:importer.datamodel/type "integer",
:name "fatalities",
:value "16"}),
:attributes
({:name "description",
:importer.datamodel/type "string",
:value "Violence"})}),
:attributes (),
:ioi-slice "per-item"})}
What tool can create the spec for such a structure?
I am trying to use this tool: https://github.com/stathissideris/spec-provider
but it gives me this:
(spec/def :importer.datamodel/data
(clojure.spec.alpha/coll-of
(clojure.spec.alpha/or
:collection
(clojure.spec.alpha/coll-of
(clojure.spec.alpha/keys
:req
[:importer.datamodel/global-id]
:opt
[:importer.datamodel/center
:importer.datamodel/part-of
:importer.datamodel/type]
:opt-un
[:importer.datamodel/attributes
:importer.datamodel/datasource
:importer.datamodel/features
:importer.datamodel/ioi-slice
:importer.datamodel/name
:importer.datamodel/url]))
:simple
clojure.core/keyword?)))
which is not complete solution...
I use (sp/pprint-specs (sp/infer-specs data :importer.datamodel/data) 'data 's)...
What tool can create the spec for such a structure?

I am trying to use this tool: https://github.com/stathissideris/spec-provider
spec-provider isn't giving you the desired result because your data is a complex nested/recursive structure. Some of those maps would be best spec'd with multi-specs, but spec-provider won't do that; one of the caveats in its docs says There is no attempt to infer multi-spec.
The only way to properly spec some of these maps is using multi-specs their spec will depend on their :importer.datamodel/type value.
First, let's look at the top-level keys (assuming the map is in a binding named data):
(keys data) => (:contexts :datasources :iois)
Create a s/keys spec for the outermost map:
(s/def ::my-map
(s/keys :req-un [::contexts ::datasources ::iois]))
These keys are unqualified, but we must use qualified keywords w/:req-un to spec them. We can use the REPL to look at the shapes of nested maps and their relationships to :importer.datamodel/type, by walking the nested structure and collecting data:
(let [keysets (atom #{})]
(clojure.walk/postwalk
(fn [v]
(when (map? v)
(swap! keysets conj [(:importer.datamodel/type v) (keys v)]))
v)
data)
#keysets)
=>
#{...
["organisation" (:importer.datamodel/global-id :importer.datamodel/type :name)]
[nil (:context :context-association-type :context-association-name :priority)]
["description"
(:importer.datamodel/global-id :date :fulltext :location-meanings
:interaction-name :importer.datamodel/type :has-contexts :facts :attributes)]
["event" (:importer.datamodel/global-id :importer.datamodel/type :datasource :features :attributes :ioi-slice)]
...}
(An upcoming spec alpha should make it easier to define specs programmatically from this data.)
Multi-specs
We can see there are some map shapes that don't have a :importer.datamodel/type, but we can write multi-specs for the ones that do. First define a multimethod for dispatching on the type key:
(defmulti type-spec :importer.datamodel/type)
Then write a defmethod for each :importer.datamodel/type value. Here are a few examples:
(defmethod type-spec :default [_] (s/keys))
(defmethod type-spec "organisation" [_]
(s/keys :req [:importer.datamodel/global-id]
:req-un [::name]))
(defmethod type-spec "description" [_]
(s/keys :req [:importer.datamodel/global-id]
:req-un [::date ::fulltext ::location-meanings ::interaction-name
::has-contexts ::facts ::attributes]))
(defmethod type-spec "event" [_]
(s/keys :req-un [::features]))
Then define the s/multi-spec:
(s/def ::datamodel
(s/multi-spec type-spec :importer.datamodel/type))
Now any map we conform to ::datamodel will resolve a spec based on its :importer.datamodel/type value. We can assign that spec to keywords that spec will use to conform the maps, e.g. one of the outermost keys:
(s/def ::contexts (s/coll-of ::datamodel))
Now if you remove a required key from one of the maps we spec'd under :contexts, spec can tell you what's wrong. For example, removing the :name key from an "organisation" map:
(s/explain ::my-map data)
In: [:contexts 3]
val: #:importer.datamodel{:global-id "88844f94f79c75acfcb957bb41386149",
:type "organisation"}
fails spec: :playground.so/datamodel
at: [:contexts "organisation"]
predicate: (contains? % :name)
Other specs
For the maps that don't have a :importer.datamodel/type you should be able to define a key spec. For example, the nested :has-contexts key has a collection of maps without a :importer.datamodel/type, but if we can assume they'll all be similar we can write this spec:
(s/def ::has-contexts
(s/coll-of (s/keys :req-un [::context ::context-association-type
::context-association-name ::priority])))
:has-contexts is in a map we've already covered with a multi-spec above, and simply registering a spec to this key will make spec conform its values. The outermost key that contains this spec is :iois so we can spec that key too:
(s/def ::iois (s/coll-of ::datamodel))
Now, conforming an input to ::my-map spec will automatically cover more data.
What tool can create the spec for such a structure?
As you can see, writing a full spec for this structure is non-trivial but possible. I don't know of any existing tool that could automatically infer a complete, "correct" spec for this structure. It would've had to intuit that :importer.datamodel/type is a key that could be used to dispatch to different s/keys specs — and it would still be making a potentially invalid assumption. I think tool-assisted spec generation is more realistic and practical in this case.

Why not to create a history table using a trigger which inserts old data just before the transaction.
Something like this,
CREATE TRIGGER SNAPSHOT_TRIGGER BEFORE
INSERT ON MY_TABLE REFERENCING NEW ROW MYNEWROW
FOR EACH ROW
BEGIN
INSERT INTO "HISTORY_TABLE" VALUES(121,'','zzzz');
END;
(Please check the syntax)

With HANA 2 SPS 03 you could use the system-versioned tables feature.
For system-versioned tables HANA automatically keeps a separate table of old record versions that can be accessed independently from the main table.

Related

How to specify that two keys in a map should have the same value with Clojure.Spec?

Say for a minimal example, I've got a map with the following fields.
{:name
:password
:confirm-password}
and I've written the following specs for this shape.
(s/def ::name string?)
;; password is a string and between 8 - 255 characters
(s/def ::password (s/and string? #(<= 8 (count %) 255))
;; How to write (s/def ::confirm-password)
(s/def ::sign-up-form (s/keys :req-un [::name
::password
::confirm-password])
How would I go about writing a ::confirm-password spec to check whether the two values are equal? i.e. I need access to that other field (password) to get to it.
One thing I tried was to write the spec on the sign-up-form to get access to the keys to make sure they were the same and that kind of works but the problem with that is I lose the path specificity. Basically the spec/problem that get's generated points towards the sign-up form rather than the ::confirm-password which I would like ideally.
You can s/and another predicate with your s/keys spec to check equality between the two keys' values:
(s/def ::sign-up-form
(s/and
(s/keys :req-un [::name
::password
::confirm-password])
#(= (:password %) (:confirm-password %))))
This anonymous function predicate receives the entire conformed map output of the s/keys spec.
(s/explain ::sign-up-form
{:name "Taylor"
:password "weak pass"
:confirm-password "weak pass!"})
;; val: {:name "Taylor", :password "weak pass", :confirm-password "weak pass!"}
;; fails spec: :sandbox.so/sign-up-form predicate:
;; (= (:password %) (:confirm-password %))

Clojure Spec accessing data in hierarchical spec

If you have a set of specs that are used to validate a hierarchical set of data - say a yaml file. From one of the child specs, is it possible to reference data that occurs earlier in the tree?
This is an example of one approach you could take:
(s/def ::tag string?)
(s/def ::inner (s/keys :req-un [::tag]))
(s/def ::outer
(s/and
(s/keys :req-un [::inner ::tag])
#(= (:tag %) ;; this tag must equal inner tag
(:tag (:inner %)))))
(s/conform ::outer {:tag "y" ;; inner doesn't match outer
:inner {:tag "x"}})
;=> :clojure.spec.alpha/invalid
(s/conform ::outer {:tag "x"
:inner {:tag "x"}})
;=> {:tag "x", :inner {:tag "x"}}
Depending on your requirements you might be able to make your assertions like this, from the outside-in rather than inside-out.

clojure spec - validating contents of maps

I want to create a clojure spec for a map that has rules about the presence of particular keys.
The map must have a :type and can have either :default or :value but not both. I tried:
(s/def ::propertyDef
(s/keys :req [::type (s/or ::default ::value) ] :opt [::description ::required]))
but I got
CompilerException java.lang.AssertionError: Assert failed:
spec/or expects k1 p1 k2 p2..., where ks are keywords
(c/and (even? (count key-pred-forms)) (every? keyword? keys)),
compiling:(C:\Users\MartinRoberts\AppData\Local\Temp\form-init4830956164341520551.clj:1:22)
but the or gave me an error as it is in the wrong format. I have to admit to not really understanding in the documentation for s/or.
First: you are using s/or to specify either a ::default or a ::value in your list of required keys. s/or requires :label spec pairs, and you are giving only the specs themselves, which is the cause of the error.
To solve, simply use or instead:
(s/def ::propertyDef (s/keys :req [::type (or ::default ::value)]
:opt [::description ::required]))
This allows both ::default and ::value to be present in the map, but this is almost always okay. The code which actually uses the map can simply check for the presence of ::value and use that, and if it's not there, then use ::default (or whatever your logic happens to be). This is usually done as such:
(let [myvalue (or (::value mymap) (::default mymap))] ...)
There could be thousands of keys in the map, and it would not affect your ability to extract the keys you need. This is why spec does not provide a built-in way to specify keys that should not be in the map, only ways to specify which keys should be present (namely, :req and :req-un in s/keys). Think of how most http servers work: you can give them nonsensical header keys and values, but they don't refuse to service the request; they just ignore them and return a response.
So, you likely don't need to enforce that only one or the other be present, but if you must, you can define an exclusive or function:
(defn xor
[p q]
(and (or p q)
(not (and p q))))
and then add this as an additional predicate on the spec:
(s/def ::propertyDef (s/and (s/keys :req [::type (or ::default ::value)]
:opt [::description ::required])
#(xor (::default %) (::value %))))
(s/valid? ::propertyDef {::type "type" ::default "default"})
=> true
(s/valid? ::propertyDef {::type "type" ::value "value"})
=> true
(s/valid? ::propertyDef {::type "type" ::default "default" ::value "value"})
=> false

Specs for conformed specs / ASTs

I have a DSL specification which is a sequence as usual (cat). I want to take advantage of spec's parsing (i.e. conforming) to get the AST of an expression that conforms with my DSL. E.g.
user> (s/def ::person (s/cat :person-sym '#{person} :name string? :age number?))
=> :user/person
user> (s/conform ::person '(person "Henry The Sloth" 55))
=> {:person-sym person, :name "Henry The Sloth", :age 55}
Now that it's parsed and I have my AST, I would want to do interesting things with it, so I would want to test it and whatnot. So now I need to write a spec for that AST, and that's basically duplicating everything. Actually it's worse than that because now I have to s/def specs for predicates that I didn't have to before, because as the docs for keys says: "there is no support for inline value specification, by design." / "It is the (enforced) opinion of spec that the specification of values associated with a namespaced keyword, like :my.ns/k, should be registered under that keyword itself..". So duplicating (with omitting the person-sym part):
user> (s/def ::name string?)
=> :user/name
user> (s/def ::age number?)
=> :user/age
user> (s/def ::person-ast (s/keys :req-un [::name ::age]))
:user/person-ast
And now it seems to be compatible:
user> (s/conform ::person-ast (s/conform ::person '(person "Henry The Sloth" 55)))
=> {:person-sym person, :name "Henry The Sloth", :age 55}
In practice, I have more complicated data of course, and I wonder what should I do? AFAIK spec doesn't give me the spec for the AST that it creates (actually personally I would figure that this is something it should do). Any suggestions?
I'd say right now you have two options - one is to do what you're doing and create two sets of specs for the before/after.
The other option is to create a model of your domain in data and generate both specs (I've seen many people are doing something like this).
I have not heard Rich talk about generating the output spec of conformed results so I don't think that is likely in the current roadmap.

Forbidden keys in clojure.spec

I am following the clojure.spec guide. I understand it is possible to declare required and optional attributes when using clojure.spec/keys.
I don't understand what is meant by optional. To me :opt doesn't do anything.
(s/valid? (s/keys :req [:my/a]) {:my/a 1 :my/b 2}) ;=> true
(s/valid? (s/keys :req [:my/a] :opt []) {:my/a 1 :my/b 2}) ;=> true
The guide promises to explain this to me, "We’ll see later where optional attributes can be useful", but I fail to find the explanation. Can I declare forbidden keys? Or somehow declare the set of valid keys to equal the keys in :req and :opt?
This is a very good question, and the clojure.spec API gives the (granted, short and unsatisfying) answer:
The :opt keys serve as documentation and
may be used by the generator.
I do not think you can invalidate a map if it contains an extra (this is what you mean by "forbidden" I think) key using this method. However, you could use this spec to make sure ::bad-key is not present:
(s/def ::m (s/and (s/keys :req [::a]) #(not (contains? % ::bad-key))))
(s/valid? ::m {::a "required!"}) ; => true
(s/valid? ::m {::a "required!" ::b "optional!"}) ; => true
(s/valid? ::m {::a "required!" ::bad-key "no good!"}) ; => false
You could limit the number of keys to exactly the set you want by using this spec:
(s/def ::r (s/and (s/keys :req [::reqd1 ::reqd2]) #(= (count %) 2)))
(s/valid? ::r {::reqd1 "abc" ::reqd2 "xyz"}) ; => true
(s/valid? ::r {::reqd1 "abc" ::reqd2 "xyz" ::extra 123}) ; => false
Still, the best way to handle this IMO, would be to simply ignore that there is a key present that you don't care about.
Hopefully as spec matures, these nice things will be added. Or, maybe they are already there (it is changing rapidly) and I simply don't know about it. This is a very new concept in clojure, so most of us have a lot to learn about it.
UPDATE - December 2016
I just wanted to revisit this 6 months since writing it. It looks like my initial comment about ignoring keys you don't care about is the preferred way to go. In fact, at the clojure/conj conference I attended two weeks ago, Rich's keynote specifically addressed the notion of versioning in all levels of software, from the function level up to the application level. He even specifically mentions this notion of disallowing keys in the talk, which can be found on youtube. He says that it was intentionally designed so that only required keys can be spec'd. Disallowing keys really serves no good purpose, and it should be done with caution.
Regarding the :opt keys, I think the original answer still stands up pretty well--it's documentation, and practically, it allows these optionally specified keys to be generated:
(s/def ::name #{"Bob" "Josh" "Mary" "Susan"})
(s/def ::height-inches (s/int-in 48 90))
(s/def ::person (s/keys :req-un [::name] :opt-un [::height-inches]))
(map first (s/exercise ::person))
; some generated data have :height-inches, some do not
({:name "Susan"}
{:name "Mary", :height-inches 48}
{:name "Bob", :height-inches 49}
{:name "Josh"}
The point about optional keys is that the value will be validated if they appear in the map