Datomic query performance improvements

Datomic query performance improvements - clojure

I have a schema that looks similar to this in a Datomic database:
; --- tenant
{:db/id #db/id[:db.part/db]
:db/ident :tenant/guid
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :tenant/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :tenant/taks
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- task
{:db/id #db/id[:db.part/db]
:db/ident :task/guid
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/createdAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/subtasks
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- subtask
{:db/id #db/id[:db.part/db]
:db/ident :subtask/guid
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/type
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/startedAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/completedAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/participants
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- participant
{:db/id #db/id[:db.part/db]
:db/ident :participant/guid
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :participant/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
The tasks are pretty static over time but subtasks are added and removed on average about once per 5 minutes per task. I would say that each task on average has about 40 subtasks at any given time containing (almost always but there are a few exceptions) one participant. My sole purpose of using Datomic is to be able to see how tasks have evolved over time, i.e. I'd like to see the what a task looked like at a given time. To achieve I'm currently doing something similar to this:
(defn find-tasks-by-tenant-at-time
[conn tenant-guid ^long time-epoch]
(let [db-conn (-> conn d/db (d/as-of (Date. time-epoch)))
task-ids (->> (d/q '[:find ?taskIds
:in $ ?tenantGuid
:where
[?tenantId :tenant/guid ?tenantGuid]
[?tenantId :tenant/tasks ?taskIds]]
db-conn tenant-guid)
vec flatten)
task-entities (map #(d/entity db-conn %) task-ids)
dtos (map (fn [task]
(letfn [(participant-dto [participant]
{:id (:participant/guid participant)
:name (:participant/name participant)})
(subtask-dto [subtask]
{:id (:subtask/guid subtask)
:type (:subtask/type subtask)
:participants (map participant-dto (:subtask/participants subtask))})]
{:id (:task/guid task)
:name (:task/name task)
:subtasks (map subtask-dto (:task/subtasks task))})) task-entities)]
dtos))
Unfortunately this is extremely slow. It can take almost 60 seconds to return from this function if there are many tasks for a tenant (say 20) each containing roughly 40 subtasks. Am I doing something obviously wrong here? Is it possible to speed this up?
Update:
The entire dataset is roughly 2 Gb and the peer has 3.5Gb of memory (but it doesn't seem to make any difference if I decrease it to say 1.5 Gb) and the transactor has 1 Gb of memory. I'm using Datomic Free.

Before you start profiling etc. you could replace
[:find ?taskIds ...]
by
[:find (pull ?task-entity [*]) ...]
to reduce the number of round-trips to the peer and thus get rid of the map statement for task-entities. In a second step replace [*] with the appropriate set of keys you really want to pull for each entity.

Related

Are nested lookup refs in composite tuples supported in datomic?

I'm using Datomic Ions to develop an application. In my schema I use composite tuples to guarantee uniqueness: shelves have books and the shelf+book combination must be unique. This is my schema:
{:db/ident :shelf/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/value
:db/doc "A shelf is a grouping of books"}
;; Books
{:db/ident :book/shelf
:db/valueType :db.type/ref
:db/isComponent true
:db/cardinality :db.cardinality/one
:db/doc "Shelf this book belongs to"}
{:db/ident :book/id
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/doc "The book identifier"}
;; Shelf + Book combination must be unique
{:db/ident :book/shelf+book
:db/valueType :db.type/tuple
:db/tupleAttrs [:book/shelf :book/id]
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity}
With the schema above I can do the following pull/queries:
(d/pull db '[*] [:shelf/name "my-shelf"])
Returns: {:db/id 74766790688854, :shelf/name "my-shelf"}
And:
(d/q '[:find ?b ?id
:in $ ?shelf+book
:where [?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [74766790688854 "book-1"])
Returns: [[101155069755527 "book-1"]].
However I would like to use a lookup ref to resolve the shelf reference in a single query to avoid having to do a separate query to get the shelf reference, something like:
(d/q '[:find ?b ?id
:in $ ?shelf+book
:where
[?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [[:shelf/name "my-shelf"] "book-1"])
But the above returns []. Is it possible to nest lookup refs like the above example?

I never tried this with composite tuples, but this is how it works in principle:
You can use the pull API inside normal queries like so:
(d/q '[:find ?id (pull ?b [:shelf/name])
:in $ ?shelf+book
:where
[?b :book/shelf+book ?shelf+book]
[?b :book/id ?id]]
db [[:shelf/name "my-shelf"] "book-1"])
I use this a lot for querying graphs inside datomic. Here's an example from my application:
(d/q '[:find
?id
(pull ?t [:db/ident])
(pull ?target-node [:my.graph.node/id
:my.graph.node/label])
:where
[?e :my.graph.edge/id ?id]
[?e :my.graph.edge/type ?t]
[?e :my.graph.edge/target ?target-node]]
db)
Where my.graph.edge/target is a :ref and my.graph.edge/type is the value of an enum.
An example output would be
[[#uuid"8e5fb3a4-1cac-40e2-ab8f-33352b6cabb3"
#:db{:ident :my.graph.edge.type/relationship}
#:my.graph.node{:label "Some node"}]
...]

"Unable to resolve entity" error when trying to transact a Datomic schema

I am a Datomic super-newbie. I'm trying to add a taxonomy to my database, but am getting an error that I can't follow. The error is:
{:datomic.client-spi/request-id "c587b3e8-8f19-45f5-a563-bdba13e3a0d8",
:cognitect.anomalies/category :cognitect.anomalies/not-found,
:cognitect.anomalies/message
":db.error/not-an-entity Unable to resolve entity: {:idx -1000000, :part :db.part/db} in datom [{:idx -1000000, :part :db.part/db} :db/ident :arb/title]",
:dbs
[{:database-id "datomic:dev://localhost:4334/datemo",
:t 1004,
:next-t 1009,
:history false}]}
Here is the taxonomy that I'm using:
[{:db/id #db/id [:db.part/db]
:db/ident :arb/title
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/fulltext true
:db/index true
:db.install/_attribute :db.part/db}
{:db/id #db/id [:db.part/db]
:db/ident :arb/description
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id [:db.part/db]
:db/ident :arb/content
:db/unique :db.unique/identity
:db/valueType :db.type/ref
:db/isComponent true
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}]
And here is the call that I made:
(def arb-tx (-> (io/resource "schemas/arb.edn") ;; the schema above
(read-all)
(first)))
(pprint (<!! (client/transact conn {:tx-data arb-tx})))
It's hard for me to understand from the error message what exactly is not resolvable here. I think my understanding of what is going on underneath the hood is too vague to understand what is wrong here. Can anyone enlighten me?

The Datomic Client library doesn't support explicit partition or :db.install/_attribute in schema definition. Those elements were required when using Peers prior to Datomic 0.9.5430.
Try replacing your schema definition with:
[{:db/ident :arb/title
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/fulltext true
:db/index true}
{:db/ident :arb/description
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
{:db/ident :arb/content
:db/unique :db.unique/identity
:db/valueType :db.type/ref
:db/isComponent true
:db/cardinality :db.cardinality/many}]
-Marshall

Finding the date of the oldest and newest entity with a certain attribute in Datomic?

Let's say a have a Datomic schema like this:
{:db/id #db/id[:db.part/db]
:db/ident :app/createdAt
:db/doc "The date and time when the entity was created (not necessarily the same as tx time)"
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :app/type
:db/doc "The type of the entity"
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
And multiple of such entities are created in the lifetime of the application. I'm interested in finding the :app/createdAt instant/date for the oldest and newest entity of a certain type (:app/type), say "type1". How would such a query look like in Datomic?

An easy way is to use a Datalog query:
[:find (min ?c) (max ?c) :in $ ?type :where
[?e :app/type ?type]
[?e :app/createdAt ?c]]
Performance considerations
As of Datomic 0.9.5385, the Datalog engine will perform a full scan of the entities matching the [?e :app/type ?type] clause; if there are many such entities, this can result in many network roundtrips to storage, high resource consumption on the Peer, and significant latency.
Fortunately, you can use Datomic's Optimization of Range Predicates to restrict the number of datoms scanned by the query. For instance, to compute the maximum creation date, if you know that at least one such entity was created after August 2016, you can call:
(d/q '[:find (max ?c) . :in $ ?type ?lower-bound :where
[?e :app/createdAt ?c]
[(>= ?c ?lower-bound)]
[?e :app/type ?type]]
db #inst "2016-08")
Note that the order of Datalog clauses matters.
Disclaimer: I am not knowledgeable about the source code of Datomic, and am only inferring the above assertions from personal experiments.

Datomic valueType

When trying to persist a list of node entities with a :threshold attribute defined thus in the schema:
{:db/id #db/id[:db.part/db]
:db/ident :node/threshold
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one
:db/fulltext false
:db/doc "Threshold"
:db.install/_attribute :db.part/db}
i get the following error:
CompilerException java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException:
:db.error/wrong-type-for-attribute Value 90 is not a valid :int
for attribute :node/threshold
I use the following code:
(defn store-tree [tree]
#(d/transact dbconn/conn (into [] (vals tree))))
(store-tree parsed-tree-with-refs)
where tree is a map of node names to nodes.
Curiously enough, i took the EDN for the specific entity with :node threshold 90 from the REPL and manually transact'ed it, and it worked without any problems. I used this code:
#(d/transact dbconn/conn [{:db/id (d/tempid :db.part/user),
:node/threshold 90, :node/location "US"}])
Can someone please help?
Thanks,
Vitaliy.

How do I build a transaction that has references to a variable number of entities?

I'm getting into datomic and still don't grok it. How do I build a transaction that has references to a variable number of entities?
For example this creates a transaction with a child entity and a family entity with a child attribute that references the new child entity:
(defn insert-child [id child]
{:db/id #db/id id
:child/first-name (:first-name child)
:child/middle-name (:middle-name child)
:child/last-name (:last-name child)
:child/date-of-birth {:date-of-birth child}})
(defn insert-family [id]
(let [child-id #db/id[:db.part/user]]
(vector
(insert-child child-id
{:first-name "Richard"
:middle-name "M"
:last-name "Stallman"})
{:db/id id
:family/child child-id})))
(insert-family #db/id[:db.part/user])
=> [{:db/id #db/id[:db.part/user -1000012],
:child/first-name "Richard",
:child/middle-name "M",
:child/last-name "Stallman",
:child/date-of-birth nil}
{:db/id #db/id[:db.part/user -1000013],
:family/child #db/id[:db.part/user -1000012]}]
Notice I used let for child-id. I'm not sure how to write this such that I can map over insert-child while having a family entity that references each one.
I thought about using iterate over #db/id[:db.part/user] and the number of children then mapping over both the result of iterate and a vector of children. Seems kind of convoluted and #db/id[:db.part/user] isn't a function to iterate over to begin with.

Instead of using the macro form #db/id[:db.part/user] which is meant for EDN files and data literals, you should use d/tempid.
You could do something like this (using simplified child entities):
(ns family-tx
(:require [datomic.api :refer [q db] :as d]))
(def uri "datomic:mem://testfamily")
(d/delete-database uri)
(d/create-database uri)
(def conn (d/connect uri))
(def schema [
{:db/id (d/tempid :db.part/db)
:db/ident :first-name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id (d/tempid :db.part/db)
:db/ident :last-name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id (d/tempid :db.part/db)
:db/ident :family/child
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
])
#(d/transact conn schema)
(defn make-family-tx [kids]
(let [kids-tx (map #(into {:db/id (d/tempid :db.part/user)} %) kids)
kids-id (map :db/id kids-tx)]
(conj kids-tx {:db/id (d/tempid :db.part/user)
:family/child kids-id})))
(def kids [{:first-name "Billy" :last-name "Bob"}
{:first-name "Jim" :last-name "Beau"}
{:first-name "Junior" :last-name "Bacon"}])
#(d/transact conn (make-family-tx kids))
There are a few strategies for this discussed in the Transactions docs as well (see the "Identifying Entities" section).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Datomic query performance improvements - clojure

Related

Are nested lookup refs in composite tuples supported in datomic?

"Unable to resolve entity" error when trying to transact a Datomic schema

Finding the date of the oldest and newest entity with a certain attribute in Datomic?

Datomic valueType

How do I build a transaction that has references to a variable number of entities?

Categories

Resources