Datomic rules vs query perofrmance - clojure

I have the following fn that returns a query using a pull pattern. It works.
(defn- site-query
([db site-id pattern cutoff]
(d/q '[:find [(pull ?ci pattern) ...]
:in $ pattern ?site-id ?cutoff
:where
[?ci :clockin/time ?t]
[(> ?t ?cutoff)]
[?ci :clockin/site ?site-id]]
db
pattern
site-id
cutoff))
([db site-id pattern]
(site-query db site-id pattern (u/timestamp-days-ago 7))))
But if I refactor the query out into a rule, the result comes in a different order and takes 10 times longer. Once sorted though the results are equal.
(defn- site-query
([db site-id pattern cutoff]
(d/q '[:find [(pull ?ci pattern) ...]
:in $ % pattern ?site-id ?cutoff
:where
(testme ?ci ?cutoff ?site-id)]
db
['[(testme ?ci ?cutoff ?site-id)
[?ci :clockin/time ?t]
[(> ?t ?cutoff)]
[?ci :clockin/site ?site-id]]]
pattern
site-id
cutoff))
([db site-id pattern]
(site-query db site-id pattern (u/timestamp-days-ago 7))))
What gives?

Related

How to find only one record from query in datomic?

I'm doing a query on datomic using datomic.api like the following:
(d/q
'[:find [(pull ?a [*]) ...]
:in $ ?title
:where
[?a :movie/title ?title]]
db title)
This query is returning almost the expected value, but as an array, like this:
[ {:db/id 17592186045442, :movie/title "Test", :movie/year 1984, :movie/director #:db{:id 17592186045439 }} ]
I want this query to return only the first match, and not all the results. What I'm doing wrong?
I found a solution for my specific case. The real issue was that I was not understanding the datomic query correctly.
[:find [(pull ?a [*]) ...]
This part is telling datomic to retrieve more than one result.
I changed the query to the following one:
(d/q
'[:find (pull ?a [*]) .
:in $ ?title
:where
[?a :movie/title ?title]]
db title)
And it worked!
The key thing was to remove the "[" after :find keyword, and switch the "..." for only ".".
If this doesn't work for you, look on the link that #EugenePakhomov posted on the comments: Equivalent of SQL "limit" clause in Datomic
It is documented in the official Datomic documentation:
Find Spec
:find ?a ?b relation (Collection of Lists)
:find [?a …] collection (Collection)
:find [?a ?b] single tuple (List)
:find ?a . single scalar (Scalar Value)

Does anyone use Datomic to get the structure and entities separately?

So I use queries to filter data and then use pull to get the information out from the Datomic database.
(def rules
[[[search ?txt ?id] [(fulltext $ :artist/name ?txt) [[?id]]]]
[[search ?txt ?id] [(fulltext $ :track/name ?txt) [[?id]]]]])
(d/q
'[:find [(pull ?id [* {:track/artists [:db/id :track/name] :track/_artists [:db/id :artist/name] }]) ...]
:in $ % ?query
:where [search ?query ?id]]
db rules "John Lennon")
And sometimes these queries can get recursive, so for example I can change the pull to:
(d/q
'[:find [(pull ?id [* {:track/artists [:db/id :track/name] :track/_artists [* {:track/artists [:db/id :track/name]}]}]) ...]
:in $ % ?query
:where [search ?query ?id]]
db rules "John Lennon")
Now what I'd like to do is ensure that unique entities are being returned along with the :db/id structure as I don't want to return duplicate data as much as possible.
For example: (results elided with ...)
{:entities [{:db/id 1 :track/name "..." ...} {:db/id 2 :track/name "..." ...} {:db/id 3 :artist/name "..." ...}]
:structure [{:db/id 1 :track/artists [{:db/id 3}]} {:db/id 2 :track/artists [{:db/id 3}]}]}
Can this be done at the query level? Or do I need to walk the structure after the query returns and modify it? I'm happy to walk the structure at present, I'm just wondering if anyone has worked out a better approach?

Find entities whose ref-to-many attribute contains all elements of input

Suppose I have entity entry with ref-to-many attribute :entry/groups. How should I build a query to find entities whose :entry/groups attribute contains all of my input foreign ids?
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
TL;DR
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
3 strategies here:
Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
Use the indexes directly (EAVT or AVET).
Dynamic Datalog query
Datalog has no direct way of expressing dynamic conjunction (logical AND / 'for all ...' / set intersection). However, you can achieve it in pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negations, i.e (For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
This one has the advantage of accepting an empty ?vs collection (so long as ?e and ?a have been bound in some other way in the query).
Generating the query code
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v values)
(defn q-find-having-all-vs
[n-vs]
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
:where
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
:where
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
:where
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
:where
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))
You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 clauses, one for each desired value in the set:
; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set (td/find :let [$ (live-db)]
:find [?name]
:where {:person/name ?name :weapon/type :weapon/guile }
{:person/name ?name :weapon/type :weapon/gun } ) ]
(is (= #{["Dr No"] ["M"]} tuple-set )))
In pure Datomic it will look similar, but using something like the Entity ID:
[?eid :entry/groups 2]
[?eid :entry/groups 3]
and Datomic will perform an implicit AND operation (i.e. both clauses must match; any surplus entries are ignored). This is logically a "join" operation, even though it is the same entity being queried for both values. You can find more info in the Datomic docs.

Mapping a list of datomic ids to entity maps

When I query for a list of Datomic entities, e.g like in the example below:
'[:find ?e
:where
[?e :category/name]]
Usually, I'd like to create a list of maps that represent the full entities, i.e
#{[1234] [2223]} => [{:category/name "x" :db/id 1234}, {:category/name "y" :db/id 2223}]
Here is my approach at the moment, in the form of a helper function.
(defn- db-ids->entity-maps
"Takes a list of datomic entity ids retrieves and returns
a list of hydrated entities in the form of a list of maps."
[db-conn db-ids]
(->>
db-ids
seq
flatten
(map #(->>
%
;; id -> lazy entity map
(d/entity (d/db db-conn))
;; realize all values, except for db/id
d/touch
(into {:db/id %})))))
Is there a better way?
With the pull api, this is pretty easy now.
'[:find [(pull ?e [*]) ...]
:in $ [[?e] ...]
:where [?e]]
I used to take this approach to save queries to the DB, the code is probably less reusable but it depends on what is more critical in your current scenario. I haven't a Datomic instance configured as I am not working with it right now so it may contain syntax error but I hope you get the idea.
(def query-result '[:find ?cat-name ?id
:where
[?cat-name :category/name
[?id :db/id]])
=>
#{["x" 1234] ["x" 2223]}
(defn- describe-values
"Adds proper keys to the given values."
[keys-vec query-result]
(vec (map #(zipmap keys-vec %) query-result))
(describe-values [:category/name :db/id] query-result)
=>
[{:db/id 2223, :category/name "x"} {:db/id 1234, :category/name "x"}]

Retrieve Most Recent Entity from Datomic

I'm interested in entities and their timestamps. Essentially, I want a time-sorted list of entities.
To that end, I've composed the following functions:
(defn return-posts
"grabs all posts from Datomic"
[]
(d/q '[:find ?title ?body ?slug
:where
[?e :post/title ?title]
[?e :post/slug ?slug]
[?e :post/body ?body]] (d/db connection)))
(defn get-postid-from-slug
[slug]
(d/q '[:find ?e
:in $ ?slug
:where [?e :post/slug ?slug]] (d/db connection) slug))
(defn get-post-timestamp
"given an entid, returns the most recent timestamp"
[entid]
(->
(d/q '[:find ?ts
:in $ ?e
:where
[?e _ _ _]
[?e :db/txInstant ?ts]] (d/db connection) entid)
(sort)
(reverse)
(first)))
Which I feel must be a hack rooted in ignorance.
Would someone more well-versed in idiomatic Datomic usage chime in and upgrade my paradigms?
I was bothered by the idea of adding additional timestamps to a database that nominally understands time as a first-class principle and so (after a night of mulling on the approaches outlined by Ulrik Sandberg) evolved the following function:
(defn return-posts
"grabs all posts from Datomic"
[uri]
(d/q '[:find ?title ?body ?slug ?ts
:where
[?e :post/title ?title ?tx]
[?e :post/slug ?slug]
[?e :post/body ?body]
[?tx :db/txInstant ?ts]] (d/db (d/connect uri))))
It's idiomatic in Datalog to omit the binding to the transaction ID itself as we typically don't care. In this situation, we very definitely care and in the words of August Lileaas, wish to "traverse the transaction" (there are situations in which we'd want the post creation time, but for this application the transaction time will suffice for ordering entities).
A notable downside to this approach is that recently edited entries will be bumped up in the list. To that end, I'll have to do something later on in order to get their "first appearance" in Datomic for blog-standard post history.
To summarize:
I have bound the transaction entity ID per "post" entity ID, and then looked up the transaction timestamp with this function for later sorting.
There isn't really a more elegant way to do this than traversing the transactions. This is why I prefer to have a separate domain specific attribute for timestamps, instead of relying on the transaction timestamps from Datomic. One example where this is necessary is merging: let's say you have a wiki, and you want to merge two wiki pages. In that case, you probably want to control the timestamp yourself, and not use the timestamp from the transaction.
I like to have the attributes :created-at and :changed-at. When I transact new entities:
[[:db/add tempid :post/slug "..."]
[:db/add tempid :post/title "A title"]
[:db/add tempid :created-at (java.util.Date.)]
[:db/add tempid :changed-at (java.util.Date.)]]
Then for updates:
[[:db/add post-eid :post/title "An updated title"]
[:db/add post-eid :changed-at (java.util.Date.)]]
That way all I have to do is to read out the :created-at attribute of the entity, which will be ready and waiting in the index.
(defmacro find-one-entity
"Returns entity when query matches, otherwise nil"
[q db & args]
`(when-let [eid# (ffirst (d/q ~q ~db ~#args))]
(d/entity ~db eid#)))
(defn find-post-by-slug
[db slug]
(find-one-entity
'[:find ?e
:in $ ?slug
:where
[?e :post/slug ?slug]]
db
slug))
;; Get timestamp
(:created-at (find-post-by-slug db "my-post-slug"))