Are nested lookup refs in composite tuples supported in datomic? - clojure

I'm using Datomic Ions to develop an application. In my schema I use composite tuples to guarantee uniqueness: shelves have books and the shelf+book combination must be unique. This is my schema:
{:db/ident :shelf/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/value
:db/doc "A shelf is a grouping of books"}
;; Books
{:db/ident :book/shelf
:db/valueType :db.type/ref
:db/isComponent true
:db/cardinality :db.cardinality/one
:db/doc "Shelf this book belongs to"}
{:db/ident :book/id
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/doc "The book identifier"}
;; Shelf + Book combination must be unique
{:db/ident :book/shelf+book
:db/valueType :db.type/tuple
:db/tupleAttrs [:book/shelf :book/id]
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity}
With the schema above I can do the following pull/queries:
(d/pull db '[*] [:shelf/name "my-shelf"])
Returns: {:db/id 74766790688854, :shelf/name "my-shelf"}
And:
(d/q '[:find ?b ?id
:in $ ?shelf+book
:where [?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [74766790688854 "book-1"])
Returns: [[101155069755527 "book-1"]].
However I would like to use a lookup ref to resolve the shelf reference in a single query to avoid having to do a separate query to get the shelf reference, something like:
(d/q '[:find ?b ?id
:in $ ?shelf+book
:where
[?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [[:shelf/name "my-shelf"] "book-1"])
But the above returns []. Is it possible to nest lookup refs like the above example?

I never tried this with composite tuples, but this is how it works in principle:
You can use the pull API inside normal queries like so:
(d/q '[:find ?id (pull ?b [:shelf/name])
:in $ ?shelf+book
:where
[?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [[:shelf/name "my-shelf"] "book-1"])
I use this a lot for querying graphs inside datomic. Here's an example from my application:
(d/q '[:find
?id
(pull ?t [:db/ident])
(pull ?target-node [:my.graph.node/id
:my.graph.node/label])
:where
[?e :my.graph.edge/id ?id]
[?e :my.graph.edge/type ?t]
[?e :my.graph.edge/target ?target-node]]
db)
Where my.graph.edge/target is a :ref and my.graph.edge/type is the value of an enum.
An example output would be
[[#uuid"8e5fb3a4-1cac-40e2-ab8f-33352b6cabb3"
#:db{:ident :my.graph.edge.type/relationship}
#:my.graph.node{:label "Some node"}]
...]

Related

Datomic not returning the correct "min" result when retrieving entity ID in result tuple

I've got this simple schema and data:
(def product-offer-schema
[{:db/ident :product-offer/product
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}
{:db/ident :product-offer/vendor
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}
{:db/ident :product-offer/price
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one}
{:db/ident :product-offer/stock-quantity
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one}
])
(d/transact conn product-offer-schema)
(d/transact conn
[{:db/ident :vendor/Alice}
{:db/ident :vendor/Bob}
{:db/ident :product/BunnyBoots}
{:db/ident :product/Gum}
])
(d/transact conn
[{:product-offer/vendor :vendor/Alice
:product-offer/product :product/BunnyBoots
:product-offer/price 9981 ;; $99.81
:product-offer/stock-quantity 78
}
{:product-offer/vendor :vendor/Alice
:product-offer/product :product/Gum
:product-offer/price 200 ;; $2.00
:product-offer/stock-quantity 500
}
{:product-offer/vendor :vendor/Bob
:product-offer/product :product/BunnyBoots
:product-offer/price 9000 ;; $90.00
:product-offer/stock-quantity 15
}
])
When I retrieve the cheapest bunny boots, only retrieving the price, I get the expected result (9000):
(def cheapest-boots-q '[:find (min ?p) .
:where
[?e :product-offer/product :product/BunnyBoots]
[?e :product-offer/price ?p]
])
(d/q cheapest-boots-q db)
;; => 9000
However, when I want to get the entity ID along with the price, it gives me the higher-priced boots:
(def db (d/db conn))
(def cheapest-boots-q '[:find [?e (min ?p)]
:where
[?e :product-offer/product :product/BunnyBoots]
[?e :product-offer/price ?p]
])
(d/q cheapest-boots-q db)
;; => [17592186045423 9981]
I tried adding :with but that gives me an error:
(def cheapest-boots-q '[:find [?e (min ?p)]
:with ?e
:where
[?e :product-offer/product :product/BunnyBoots]
[?e :product-offer/price ?p]
])
(d/q cheapest-boots-q db)
;; => => Execution error (ArrayIndexOutOfBoundsException) at datomic.datalog/fn$project (datalog.clj:503).
What am I doing wrong?
As a commenter kind of pointed out, ?e isn't bound in any way to the (min ?p) expression, so it's not defined what you'll get there, beyond a product entity id of some sort.
What you actually want to do is unify those values somehow as part of the query, and not perform aggregation on the results, for example:
(d/q '[:find [?e ?p]
:where
[?e :product-offer/product :product/BunnyBoots]
[?e :product-offer/price ?p]
[(min ?p)]]
db)
You can see that the min clause is part of the query, and as such will take part in the unification on the result, giving you what you want.

Does anyone use Datomic to get the structure and entities separately?

So I use queries to filter data and then use pull to get the information out from the Datomic database.
(def rules
[[[search ?txt ?id] [(fulltext $ :artist/name ?txt) [[?id]]]]
[[search ?txt ?id] [(fulltext $ :track/name ?txt) [[?id]]]]])
(d/q
'[:find [(pull ?id [* {:track/artists [:db/id :track/name] :track/_artists [:db/id :artist/name] }]) ...]
:in $ % ?query
:where [search ?query ?id]]
db rules "John Lennon")
And sometimes these queries can get recursive, so for example I can change the pull to:
(d/q
'[:find [(pull ?id [* {:track/artists [:db/id :track/name] :track/_artists [* {:track/artists [:db/id :track/name]}]}]) ...]
:in $ % ?query
:where [search ?query ?id]]
db rules "John Lennon")
Now what I'd like to do is ensure that unique entities are being returned along with the :db/id structure as I don't want to return duplicate data as much as possible.
For example: (results elided with ...)
{:entities [{:db/id 1 :track/name "..." ...} {:db/id 2 :track/name "..." ...} {:db/id 3 :artist/name "..." ...}]
:structure [{:db/id 1 :track/artists [{:db/id 3}]} {:db/id 2 :track/artists [{:db/id 3}]}]}
Can this be done at the query level? Or do I need to walk the structure after the query returns and modify it? I'm happy to walk the structure at present, I'm just wondering if anyone has worked out a better approach?

datomic query over history

What is the correct way to query all the properties of a datomic db entity over its history?
For example, with the pull API or pull expressions within a query one can use wildcards to print all the properties of a given entity. However, the same approach does not work for the special history db.
(d/q '[:find [(pull ?e [*]) ...] :where [?e :test/firstName "Bob"]] db-test)
; outputs list of Bob's properties
(d/q '[:find [(pull ?e [*]) ...] :where [?e :test/firstName "Bob"]] (d/history db-test))
; IllegalStateException Can't pull from history
You can use query to return all datoms for a single entity for all of history:
(d/q '[:find ?e ?a ?v ?t ?op
:in $ ?e
:where [?e ?a ?v ?t ?op]]
(d/history (d/db conn)) <Your Entity ID>)

Finding the date of the oldest and newest entity with a certain attribute in Datomic?

Let's say a have a Datomic schema like this:
{:db/id #db/id[:db.part/db]
:db/ident :app/createdAt
:db/doc "The date and time when the entity was created (not necessarily the same as tx time)"
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :app/type
:db/doc "The type of the entity"
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
And multiple of such entities are created in the lifetime of the application. I'm interested in finding the :app/createdAt instant/date for the oldest and newest entity of a certain type (:app/type), say "type1". How would such a query look like in Datomic?
An easy way is to use a Datalog query:
[:find (min ?c) (max ?c) :in $ ?type :where
[?e :app/type ?type]
[?e :app/createdAt ?c]]
Performance considerations
As of Datomic 0.9.5385, the Datalog engine will perform a full scan of the entities matching the [?e :app/type ?type] clause; if there are many such entities, this can result in many network roundtrips to storage, high resource consumption on the Peer, and significant latency.
Fortunately, you can use Datomic's Optimization of Range Predicates to restrict the number of datoms scanned by the query. For instance, to compute the maximum creation date, if you know that at least one such entity was created after August 2016, you can call:
(d/q '[:find (max ?c) . :in $ ?type ?lower-bound :where
[?e :app/createdAt ?c]
[(>= ?c ?lower-bound)]
[?e :app/type ?type]]
db #inst "2016-08")
Note that the order of Datalog clauses matters.
Disclaimer: I am not knowledgeable about the source code of Datomic, and am only inferring the above assertions from personal experiments.

Retrieve Most Recent Entity from Datomic

I'm interested in entities and their timestamps. Essentially, I want a time-sorted list of entities.
To that end, I've composed the following functions:
(defn return-posts
"grabs all posts from Datomic"
[]
(d/q '[:find ?title ?body ?slug
:where
[?e :post/title ?title]
[?e :post/slug ?slug]
[?e :post/body ?body]] (d/db connection)))
(defn get-postid-from-slug
[slug]
(d/q '[:find ?e
:in $ ?slug
:where [?e :post/slug ?slug]] (d/db connection) slug))
(defn get-post-timestamp
"given an entid, returns the most recent timestamp"
[entid]
(->
(d/q '[:find ?ts
:in $ ?e
:where
[?e _ _ _]
[?e :db/txInstant ?ts]] (d/db connection) entid)
(sort)
(reverse)
(first)))
Which I feel must be a hack rooted in ignorance.
Would someone more well-versed in idiomatic Datomic usage chime in and upgrade my paradigms?
I was bothered by the idea of adding additional timestamps to a database that nominally understands time as a first-class principle and so (after a night of mulling on the approaches outlined by Ulrik Sandberg) evolved the following function:
(defn return-posts
"grabs all posts from Datomic"
[uri]
(d/q '[:find ?title ?body ?slug ?ts
:where
[?e :post/title ?title ?tx]
[?e :post/slug ?slug]
[?e :post/body ?body]
[?tx :db/txInstant ?ts]] (d/db (d/connect uri))))
It's idiomatic in Datalog to omit the binding to the transaction ID itself as we typically don't care. In this situation, we very definitely care and in the words of August Lileaas, wish to "traverse the transaction" (there are situations in which we'd want the post creation time, but for this application the transaction time will suffice for ordering entities).
A notable downside to this approach is that recently edited entries will be bumped up in the list. To that end, I'll have to do something later on in order to get their "first appearance" in Datomic for blog-standard post history.
To summarize:
I have bound the transaction entity ID per "post" entity ID, and then looked up the transaction timestamp with this function for later sorting.
There isn't really a more elegant way to do this than traversing the transactions. This is why I prefer to have a separate domain specific attribute for timestamps, instead of relying on the transaction timestamps from Datomic. One example where this is necessary is merging: let's say you have a wiki, and you want to merge two wiki pages. In that case, you probably want to control the timestamp yourself, and not use the timestamp from the transaction.
I like to have the attributes :created-at and :changed-at. When I transact new entities:
[[:db/add tempid :post/slug "..."]
[:db/add tempid :post/title "A title"]
[:db/add tempid :created-at (java.util.Date.)]
[:db/add tempid :changed-at (java.util.Date.)]]
Then for updates:
[[:db/add post-eid :post/title "An updated title"]
[:db/add post-eid :changed-at (java.util.Date.)]]
That way all I have to do is to read out the :created-at attribute of the entity, which will be ready and waiting in the index.
(defmacro find-one-entity
"Returns entity when query matches, otherwise nil"
[q db & args]
`(when-let [eid# (ffirst (d/q ~q ~db ~#args))]
(d/entity ~db eid#)))
(defn find-post-by-slug
[db slug]
(find-one-entity
'[:find ?e
:in $ ?slug
:where
[?e :post/slug ?slug]]
db
slug))
;; Get timestamp
(:created-at (find-post-by-slug db "my-post-slug"))