datomic / datascript beginner - can we have multiple schemas - clojure

very basic question to get me started. Suppose I have a database of sales by country:
[{:sales/country "CN" :sales/amount 1000 :sales/account "XYZ"} ...]
I would like to also have a list of facts about each country something like:
[{:country/short-name "CN" :country/long-name "China" ...}]
And then do queries of the type "list all sales that happened in China (using the long-name)".
Is that one database? How do I make it clear there are two distinct schemas? Do I transact the first schema then the sales data, and later the country schema and data?
EDIT: sorry my question wasn't clear. Here is live example:
(d/transact! conn
[{:country/code "BR" :country/name "Brazil"}
{:country/code "CN" :country/name "China"}])
(d/transact! conn
[{:sales/country "CN" :sales/amount 1000 :sales/account "XYZ"}
{:sales/country "CN" :sales/amount 1000 :sales/account "AAA"}
{:sales/country "BR" :sales/amount 1000 :sales/account "BBB"}}
])
I was able to run a query to join the tables and get the results I wanted. What I don't understand is what is best practice to define my schema. Is it just one schema or two of them, one for each table. Can I do this:
(def schema {:country/code {:db/valueType :db.type/string :db/unique :db.unique/identity}
:country/name {:db/valueType :db.type/string}
:sales/account {:db/valueType :db.type/string :db/unique :db.unique/identity}
:sales/country-code {:db/valueType :db.type/string}
:sales/amount {:db/valueType :db.type/integer}
})
And is there a better way to define in the schema that country/code and sales/country-code are the same "key"?
Thanks,

You can only have 1 schema, but that should be enough. Use namespace in keywords to distinguish between different “domains” (:country/* for info about countries, :sales/* for info about sales).
Datomic is more of a column store, so any entity can have any attribute, and you can even mix different “tables” on a single attribute (I don’t recommend it, but it’s possible).
Your use of :country/code is what Datomic calls “an external id”. This is also a good practice, although, there’s no way to specify that :sales/country is a reference to :country/code.
What I suggest you do instead is make :sales/country a Datomic reference:
:sales/country-code {:db/valueType :db.type/ref}
and then link to it in transaction using lookup ref:
#(d/transact! conn [{:sales/country [:country/code "CN"] :sales/amount 1000 :sales/account "XYZ"})
This will make sure country with this code exist during transaction time. You’ll also get benefits like easy retrieving country info from sales using entities/pull.

Related

Time Series forecasting with DeepAR for multiple independent products

I wanted to forecast some data(suppose countries temperature).Is there any way to add multiple countires temperature at once in deepAR (Algorithm available at AWS Sagemaker marketplace) and deepAR forecast them independently?.Is it possible to remove a particular country data and add another after few days?
I am new to Forecasting and wanted to try deepAR.If anyone has arleady worked on this, please provide me some guidelines on how to do this using deepAR
Link - https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html
This is a late reply to this post, but my reply could be helpful in the future to others. The answer to your first question is yes.
The page you linked to references the cat field, this allows you to encode a vector representing different record groups. In your case, the cat field can just be a single value, but the cat field can encode more complex relationships too with more dimensions in the vector.
Say you have 3 countries you want to make predictions on. You have some time-series temperature training data for each country, then you would enter them as rows in the train JSON file like this:
Country 1:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [0]}
Country 2:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [1]}
Country 3:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [2]}
The category field indicates to DeepAR that these are independent data categories, in other words, different countries.
The frequency (time between temperature measurements) has to be the same for all data, however, the start time and the number of training points does not.
When you've trained the model, open the endpoint and want to make a prediction for a country, you can pass the context for a particular country along with the same cat as one of those countries above.
This allows you to make a single model that will allow you to make predictions from many independent groups of data.
I'm not sure exactly what you mean by the second question. If you mean to add more training data for another country later on, this would require you to create a different training dataset with an additional category for that country, then re-train the model.

Alter Datomic schema (unique -> not unique)

Good day everyone,
I would like to see if it's possible to "alter" datomic schema - in particular make attribute value not unique after it was declared unique in first place.
Schema:
{:db/id #db/id[:db.part/db]
:db/ident :vcs/reference
:db/doc "Our VCS reference number for a transaction"
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id :vcs/reference
:db/unique :db.unique/value
:db.alter/_attribute :db.part/db}
The possibility of doing so doesn't sound right because:
It's changing the history
If applied other way around (not-unique -> unique) will create data
conflicts.
Could anyone please clarify this?
UPDATE
Found Datomic Schema-Alteration doc in case anyone else will be searching.
UPDATE 2
To solve my particular problem I've done the following:
(d/transact-async conn [[:db/retract :vcs/reference :db/unique :db.unique/value]
[:db/add :db.part/db :db.alter/attribute :vcs/reference]])
In case you adding a uniqueness constraint - data will have to be manually updated to avoid the conflicts.
A related point is that, unlike columns in a SQL table, attributes can never be "deleted". All you can do is stop using them and "forget" that they ever existed.
If your db attributes need to evolve (and when don't they?), you may wish to version them just like the routes in an REST API. This is easily done by adding a component like ".v2" to the namespace:
:db/ident :person.v1/age
:db/valueType :db.type/string
could be replaced by:
:db/ident :person.v2/age
:db/valueType :db.type/long
and then:
:db/ident :person.v3/age
:db/valueType :db.type/double
Note that different groups of attributes may be versioned differently (like each table in a SQL db):
:db/ident :vehicle.v42/horsepower
:db/valueType :db.type/double
Of course, if your DB changes enough, it may eventually be worthwhile to copy the data (ETL) into a completely new DB with a different organization & structure.

How to represent an entity with an existing UUID in Datomic?

Background
We have multiple services that communicates via events. Many events are referring to some entity using a (globally) unique surrogate id that is present in the event. For example "CustomerRegisteredEvent" (E) may contain the id of the registered customer. When using other databases I could typically persist a "Customer" entity with an id corresponding to the id (and other values) present in (E).
In datomic I typically see the use of tempid to generate an id for a new entity but I'm not clear if I should use this approach when a UUID is known beforehand?
Questions
Is there a way to generate a Datomic id based on the id in the event?
If not, does one typically just create a new attribute for the "original" (event) id? Something like:
{:db/id #db/id[:db.part/db]
:db/ident :customer/uuid
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/doc "The original UUID of the customer"
:db.install/_attribute :db.part/db}
Just ignore the Datomic :db/id as an internal detail (just like you ignore the Git hash of a commit as an internal detail). Use the solution (2) in your question, except that you probably want to use the built-in type :db.type/uuid instead of string.
You may also be interested in looking at the Tupelo-Datomic library, which contains a number of helper and convenience functions for interacting with Datomic.
Enjoy!
P.S. Don't overlook the Datomic function d/squuid for generating semi-sequential UUIDs, which are a more efficient way of generating new UUIDs in Datomic
Update: Adding data to Datomic is somewhat confusing and is more complicated than it needs to be. That is why you can use Tupelo-Datomic to simplify the whole operation:
(td/transact *conn*
(td/new-entity { :person/name "James Bond" :location "London" :weapon/type #{ :weapon/gun :weapon/wit } } )
(td/new-entity { :person/name "M" :location "London" :weapon/type #{ :weapon/gun :weapon/guile } } )
(td/new-entity { :person/name "Dr No" :location "Caribbean" :weapon/type :weapon/gun } ))
One of the things Tupelo-Datomic does for you is to silently add the boilerplate for: {:db/id (d/tempid -partition) }

Recommended way to declare Datomic schema in Clojure application

I'm starting to develop a Datomic-backed Clojure app, and I'm wondering what's the best way to declare the schema, in order to address the following concerns:
Having a concise, readable representation for the schema
Ensuring the schema is installed and up-to-date prior to running a new version of my app.
Intuitively, my approach would be the following:
Declaring some helper functions to make schema declarations less verbose than with the raw maps
Automatically installing the schema as part of the initialization of the app (I'm not yet knowledgeable enough to know if that always works).
Is this the best way to go? How do people usually do it?
I Use Conformity for this see Conformity repository. There is also a very useful blogpost from Yeller Here which will guide you how to use Conformity.
Raw maps are verbose, but have some great advantages over using some high level api:
Schema is defined in transaction form, what you specify is transactable (assuming the word exists)
Your schema is not tied to a particular library or spec version, it will always work.
Your schema is serializable (edn) without calling a spec API.
So you can store and deploy your schema more easily in a distributed environment since it's in data-form and not in code-form.
For those reasons I use raw maps.
Automatically installing schema.
This I don't do either.
Usually when you make a change to your schema many things may be happening:
Add new attribute
Change existing attribute type
Create full-text for an attribute
Create new attribute from other values
Others
Which may need for you to change your existing data in some non obvious and not generic way, in a process which may take some time.
I do use some automatization for applying a list of schemas and schema changes, but always in a controlled "deployment" stage when more things regarding data updating may occur.
Assuming you have users.schema.edn and roles.schema.edn files:
(require '[datomic-manage.core :as manager])
(manager/create uri)
(manager/migrate uri [:users.schema
:roles.schema])
For #1, datomic-schema might be of help. I haven't used it, but the example looks promising.
My preference (and I'm biased, as the author of the library) lies with datomic-schema - It focusses on only doing the transformation to normal datomic schema - from there, you transact the schema as you would normally.
I am looking to use the same data to calculate schema migration between the live datomic instance and the definitions - so that the enums, types and cardinality gets changed to conform to your definition.
The important part (for me) of datomic-schema is that the exit path is very clean - If you find it doesn't support something (that I can't implement for whatever reason) down the line, you can dump your schema as plain edn, save it off and remove the dependency.
Conformity will be useful beyond that if you want to do some kind of data migration, or more specific migrations (cleaning up the data, or renaming to something else first).
Proposal: using transaction functions to make declaring schema attributes less verbose in EDN, this preserving the benefits of declaring your schema in EDN as demonstrated by #Guillermo Winkler's answer.
Example:
;; defining helper function
[{:db/id #db/id[:db.part/user]
:db/doc "Helper function for defining entity fields schema attributes in a concise way."
:db/ident :utils/field
:db/fn #db/fn {:lang :clojure
:require [datomic.api :as d]
:params [_ ident type doc opts]
:code [(cond-> {:db/cardinality :db.cardinality/one
:db/fulltext true
:db/index true
:db.install/_attribute :db.part/db
:db/id (d/tempid :db.part/db)
:db/ident ident
:db/valueType (condp get type
#{:db.type/string :string} :db.type/string
#{:db.type/boolean :boolean} :db.type/boolean
#{:db.type/long :long} :db.type/long
#{:db.type/bigint :bigint} :db.type/bigint
#{:db.type/float :float} :db.type/float
#{:db.type/double :double} :db.type/double
#{:db.type/bigdec :bigdec} :db.type/bigdec
#{:db.type/ref :ref} :db.type/ref
#{:db.type/instant :instant} :db.type/instant
#{:db.type/uuid :uuid} :db.type/uuid
#{:db.type/uri :uri} :db.type/uri
#{:db.type/bytes :bytes} :db.type/bytes
type)}
doc (assoc :db/doc doc)
opts (merge opts))]}}]
;; ... then (in a later transaction) using it to define application model attributes
[[:utils/field :person/name :string "A person's name" {:db/index true}]
[:utils/field :person/age :long "A person's name" nil]]
I would suggest using Tupelo Datomic to get started. I wrote this library to simplify Datomic schema creation and ease understanding, much like you allude in your question.
As an example, suppose we’re trying to keep track of information for the world’s premiere spy agency. Let’s create a few attributes that will apply to our heroes & villains (see the executable code in the unit test).
(:require [tupelo.datomic :as td]
[tupelo.schema :as ts])
; Create some new attributes. Required args are the attribute name (an optionally namespaced
; keyword) and the attribute type (full listing at http://docs.datomic.com/schema.html). We wrap
; the new attribute definitions in a transaction and immediately commit them into the DB.
(td/transact *conn* ; required required zero-or-more
; <attr name> <attr value type> <optional specs ...>
(td/new-attribute :person/name :db.type/string :db.unique/value) ; each name is unique
(td/new-attribute :person/secret-id :db.type/long :db.unique/value) ; each secret-id is unique
(td/new-attribute :weapon/type :db.type/ref :db.cardinality/many) ; one may have many weapons
(td/new-attribute :location :db.type/string) ; all default values
(td/new-attribute :favorite-weapon :db.type/keyword )) ; all default values
For the :weapon/type attribute, we want to use an enumerated type since there are only a limited number of choices available to our antagonists:
; Create some "enum" values. These are degenerate entities that serve the same purpose as an
; enumerated value in Java (these entities will never have any attributes). Again, we
; wrap our new enum values in a transaction and commit them into the DB.
(td/transact *conn*
(td/new-enum :weapon/gun)
(td/new-enum :weapon/knife)
(td/new-enum :weapon/guile)
(td/new-enum :weapon/wit))
Let’s create a few antagonists and load them into the DB. Note that we are just using plain Clojure values and literals here, and we don’t have to worry about any Datomic specific conversions.
; Create some antagonists and load them into the db. We can specify some of the attribute-value
; pairs at the time of creation, and add others later. Note that whenever we are adding multiple
; values for an attribute in a single step (e.g. :weapon/type), we must wrap all of the values
; in a set. Note that the set implies there can never be duplicate weapons for any one person.
; As before, we immediately commit the new entities into the DB.
(td/transact *conn*
(td/new-entity { :person/name "James Bond" :location "London" :weapon/type #{ :weapon/gun :weapon/wit } } )
(td/new-entity { :person/name "M" :location "London" :weapon/type #{ :weapon/gun :weapon/guile } } )
(td/new-entity { :person/name "Dr No" :location "Caribbean" :weapon/type :weapon/gun } ))
Enjoy!
Alan

enumerated types not overwriting even though cardinality/one

In writing a rating system, I want people to be able to rate posts, but I only want there to be one rating per user.
So in my schema I have something like
{:db/id #db/id[:db.part/db -1]
:db/ident :rating/value
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one <<thinking this serves a purpose
:db/doc "rating applied to this particular post"
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/user -2]
:db/ident :rating.value/verypositive}
{:db/id #db/id[:db.part/user -3]
:db/ident :rating.value/positive}
{:db/id #db/id[:db.part/user -4]
:db/ident :rating.value/needswork}
I only want there to be accessible one rating per email at any time, but I am a little stumped.
When I submit several ratings to a post
>(add-rating-to-post 1759 "so#gm.co" "verypositive")
>(add-rating-to-post 1759 "so#gm.co" "needswork")
>(add-rating-to-post 1759 "so#gm.co" "positive")
>(add-rating-to-post 1759 "so#gm.co" "verypositive")
The transaction works fine, but when I query for the ratings attached to a particular post-eid I get something like
({:bid 1759,
:rating :rating.value/verypositive
:email "sova#web"}
{:bid 1759,
:rating :rating.value/positive,
:email "sova#web"}
{:bid 1759,
:rating :rating.value/needswork,
:email "sova#web"})
Really, all I want is the latest one, so a returned list of all the ratings a user submitted where I can take (last x) would be great.
...but it will populate until there is one of each of the enumerated types, and then disregard additions.
Any suggestions on how I can achieve the behavior I'm striving for?
Many thanks in advance
What your schema is essentially saying at the moment is that you are allowed exactly one value per rating.value
I would suggest that the 'single rating per user' should not be a schema constraint but instead is a domain level problem - The appropriate way of implementing this would be to allow multiple ratings per post per user and then write a transactor function that would check if a user have rated a post before and either deny rating again, or retract the old rating (depending on what behaviour you want).
You also would want to treat the rating itself as an entity, if you're not doing that already. So that you have :rating/post :ref, :rating/value :ref and :rating/email attributes and create a new entity for every rating.
Your schema is correct. Your use of idents for enumeration is correct.
Instead of creating a new rating entity every time, use a query to check whether there is one already with the rated :bid and the rating users :email. If so, transact the new :rating attribute with a :db/add assertion (don't worry about retractions, they are created implicitly for :cardinality/one attributes). If not, create one like you are already doing.
If you need to do it atomically, e. g. to avoid the creation of two rating entities by the same user on one article in a race condition, you the need all that to be done within the transactor by writing and using a database function.
If you need to see all the ratings a user has given, use a feature like history database to query for how the :rating value of an entity has changed over time.