Suppose I have entity entry with ref-to-many attribute :entry/groups. How should I build a query to find entities whose :entry/groups attribute contains all of my input foreign ids?
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
TL;DR
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
3 strategies here:
Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
Use the indexes directly (EAVT or AVET).
Dynamic Datalog query
Datalog has no direct way of expressing dynamic conjunction (logical AND / 'for all ...' / set intersection). However, you can achieve it in pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negations, i.e (For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
This one has the advantage of accepting an empty ?vs collection (so long as ?e and ?a have been bound in some other way in the query).
Generating the query code
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v values)
(defn q-find-having-all-vs
[n-vs]
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
:where
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
:where
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
:where
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
:where
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))
You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 clauses, one for each desired value in the set:
; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set (td/find :let [$ (live-db)]
:find [?name]
:where {:person/name ?name :weapon/type :weapon/guile }
{:person/name ?name :weapon/type :weapon/gun } ) ]
(is (= #{["Dr No"] ["M"]} tuple-set )))
In pure Datomic it will look similar, but using something like the Entity ID:
[?eid :entry/groups 2]
[?eid :entry/groups 3]
and Datomic will perform an implicit AND operation (i.e. both clauses must match; any surplus entries are ignored). This is logically a "join" operation, even though it is the same entity being queried for both values. You can find more info in the Datomic docs.
Related
I'm doing some query in Datomic using Clojure, I'm trying to return a Map with keys instead of a Vector, if I don't try to return a Map with the ":keys" keyword in the query it works fine.
I tried to have equal and different names between the :find and :keys.
If I remove the :keys line bellow it works fine.
I'm using [org.clojure/clojure "1.10.0"] with [com.datomic/client-pro "0.8.28"].
(def get-links
'[:find ?e ?url ?description ?createdat ?order ?postedby
:keys e url description createdat order postedby
:in $ ?filter ?skip ?skip-plus-first
:where [?e :link/url ?url]
[?e :link/description ?description]
[?e :link/createdat ?createdat]
[?e :link/postedby ?e2]
[?e :link/order ?order]
[?e2 :user/name ?postedby]
[(.contains ?url ?filter)]
[(> ?order ?skip) ]
[(<= ?order ?skip-plus-first)]])
Here is how I'm calling it:
(d/q get-links db filter skip (+ first skip))
The exact error is:
Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:56).
"Argument :keys in :find is not a variable"
Below is Datomic examples, in their docs.
[:find ?artist-name ?release-name
:keys artist release
:where [?release :release/name ?release-name]
[?release :release/artists ?artist]
[?artist :artist/name ?artist-name]]
I think that you are using an older version of the client that doesn't know the :keys option yet.
I need to all the values of a particular attribute in an entity (for datomic schema). The retract function requires the attribute's value to be passed as argument but they are way too many, and I just require them to be replaced with new set of values. Is it possible to achieve via clojure?
You can either query all values and generate the desired retraction in your peer or, if you wish to ensure an "empty attrib" before new values are written, do the same from within a transaction function.
(map (fn [v] [:db/retract eid attrib v])
(d/q '[:find [?v ...]
:in $ ?e ?a
:where [?e ?a ?v]
db
eid
attrib))
When I query for a list of Datomic entities, e.g like in the example below:
'[:find ?e
:where
[?e :category/name]]
Usually, I'd like to create a list of maps that represent the full entities, i.e
#{[1234] [2223]} => [{:category/name "x" :db/id 1234}, {:category/name "y" :db/id 2223}]
Here is my approach at the moment, in the form of a helper function.
(defn- db-ids->entity-maps
"Takes a list of datomic entity ids retrieves and returns
a list of hydrated entities in the form of a list of maps."
[db-conn db-ids]
(->>
db-ids
seq
flatten
(map #(->>
%
;; id -> lazy entity map
(d/entity (d/db db-conn))
;; realize all values, except for db/id
d/touch
(into {:db/id %})))))
Is there a better way?
With the pull api, this is pretty easy now.
'[:find [(pull ?e [*]) ...]
:in $ [[?e] ...]
:where [?e]]
I used to take this approach to save queries to the DB, the code is probably less reusable but it depends on what is more critical in your current scenario. I haven't a Datomic instance configured as I am not working with it right now so it may contain syntax error but I hope you get the idea.
(def query-result '[:find ?cat-name ?id
:where
[?cat-name :category/name
[?id :db/id]])
=>
#{["x" 1234] ["x" 2223]}
(defn- describe-values
"Adds proper keys to the given values."
[keys-vec query-result]
(vec (map #(zipmap keys-vec %) query-result))
(describe-values [:category/name :db/id] query-result)
=>
[{:db/id 2223, :category/name "x"} {:db/id 1234, :category/name "x"}]
I'm trying to filter through a database, based on the keyword patter match.
To do this, I wrote ->
(defn find-users
[db keyword]
(if (>= (count keyword) 3)
(let [login-pattern (login-pattern keyword)]
(->> (d/datoms db :aevt :user/name)
(filter #(re-matches login-pattern (:v %)))
(map #(d/entity db (:e %)))))
[]))
But, I'm getting a StackOverflow error.
I think it's because of (map #(d/entity db (:e %)))
When I plainly do (map :e), the function works.
I'm a bit confused as to why the Stackoverflow would happen though, the queries I'm performing with map :v are returning only a few entities.
What's happening here?
I'm a little curious as to why not just use query? You can use predicate expression clauses for the same purpose that you've built the filter/map scenario above. See "Expression Clauses" at: http://docs.datomic.com/query.html
(defn find-users
[db keyword]
(if (>= (count keyword) 3)
(map #(d/entity db (first %))
(d/q '[:find ?e
:in $ ?login-pattern
:where
[?e :user/name ?name]
[(re-matches ?login-pattern ?name)]]
db
(login-pattern keyword)))
[]))
The query engine is likely to handle the size of intermediate results better than raw sequence manipulations.
I'm interested in entities and their timestamps. Essentially, I want a time-sorted list of entities.
To that end, I've composed the following functions:
(defn return-posts
"grabs all posts from Datomic"
[]
(d/q '[:find ?title ?body ?slug
:where
[?e :post/title ?title]
[?e :post/slug ?slug]
[?e :post/body ?body]] (d/db connection)))
(defn get-postid-from-slug
[slug]
(d/q '[:find ?e
:in $ ?slug
:where [?e :post/slug ?slug]] (d/db connection) slug))
(defn get-post-timestamp
"given an entid, returns the most recent timestamp"
[entid]
(->
(d/q '[:find ?ts
:in $ ?e
:where
[?e _ _ _]
[?e :db/txInstant ?ts]] (d/db connection) entid)
(sort)
(reverse)
(first)))
Which I feel must be a hack rooted in ignorance.
Would someone more well-versed in idiomatic Datomic usage chime in and upgrade my paradigms?
I was bothered by the idea of adding additional timestamps to a database that nominally understands time as a first-class principle and so (after a night of mulling on the approaches outlined by Ulrik Sandberg) evolved the following function:
(defn return-posts
"grabs all posts from Datomic"
[uri]
(d/q '[:find ?title ?body ?slug ?ts
:where
[?e :post/title ?title ?tx]
[?e :post/slug ?slug]
[?e :post/body ?body]
[?tx :db/txInstant ?ts]] (d/db (d/connect uri))))
It's idiomatic in Datalog to omit the binding to the transaction ID itself as we typically don't care. In this situation, we very definitely care and in the words of August Lileaas, wish to "traverse the transaction" (there are situations in which we'd want the post creation time, but for this application the transaction time will suffice for ordering entities).
A notable downside to this approach is that recently edited entries will be bumped up in the list. To that end, I'll have to do something later on in order to get their "first appearance" in Datomic for blog-standard post history.
To summarize:
I have bound the transaction entity ID per "post" entity ID, and then looked up the transaction timestamp with this function for later sorting.
There isn't really a more elegant way to do this than traversing the transactions. This is why I prefer to have a separate domain specific attribute for timestamps, instead of relying on the transaction timestamps from Datomic. One example where this is necessary is merging: let's say you have a wiki, and you want to merge two wiki pages. In that case, you probably want to control the timestamp yourself, and not use the timestamp from the transaction.
I like to have the attributes :created-at and :changed-at. When I transact new entities:
[[:db/add tempid :post/slug "..."]
[:db/add tempid :post/title "A title"]
[:db/add tempid :created-at (java.util.Date.)]
[:db/add tempid :changed-at (java.util.Date.)]]
Then for updates:
[[:db/add post-eid :post/title "An updated title"]
[:db/add post-eid :changed-at (java.util.Date.)]]
That way all I have to do is to read out the :created-at attribute of the entity, which will be ready and waiting in the index.
(defmacro find-one-entity
"Returns entity when query matches, otherwise nil"
[q db & args]
`(when-let [eid# (ffirst (d/q ~q ~db ~#args))]
(d/entity ~db eid#)))
(defn find-post-by-slug
[db slug]
(find-one-entity
'[:find ?e
:in $ ?slug
:where
[?e :post/slug ?slug]]
db
slug))
;; Get timestamp
(:created-at (find-post-by-slug db "my-post-slug"))