Datomic - how to retract all values of an attribute - clojure

I need to all the values of a particular attribute in an entity (for datomic schema). The retract function requires the attribute's value to be passed as argument but they are way too many, and I just require them to be replaced with new set of values. Is it possible to achieve via clojure?

You can either query all values and generate the desired retraction in your peer or, if you wish to ensure an "empty attrib" before new values are written, do the same from within a transaction function.
(map (fn [v] [:db/retract eid attrib v])
(d/q '[:find [?v ...]
:in $ ?e ?a
:where [?e ?a ?v]
db
eid
attrib))

Related

Find entities whose ref-to-many attribute contains all elements of input

Suppose I have entity entry with ref-to-many attribute :entry/groups. How should I build a query to find entities whose :entry/groups attribute contains all of my input foreign ids?
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
TL;DR
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
3 strategies here:
Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
Use the indexes directly (EAVT or AVET).
Dynamic Datalog query
Datalog has no direct way of expressing dynamic conjunction (logical AND / 'for all ...' / set intersection). However, you can achieve it in pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negations, i.e (For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
This one has the advantage of accepting an empty ?vs collection (so long as ?e and ?a have been bound in some other way in the query).
Generating the query code
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v values)
(defn q-find-having-all-vs
[n-vs]
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
:where
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
:where
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
:where
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
:where
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))
You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 clauses, one for each desired value in the set:
; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set (td/find :let [$ (live-db)]
:find [?name]
:where {:person/name ?name :weapon/type :weapon/guile }
{:person/name ?name :weapon/type :weapon/gun } ) ]
(is (= #{["Dr No"] ["M"]} tuple-set )))
In pure Datomic it will look similar, but using something like the Entity ID:
[?eid :entry/groups 2]
[?eid :entry/groups 3]
and Datomic will perform an implicit AND operation (i.e. both clauses must match; any surplus entries are ignored). This is logically a "join" operation, even though it is the same entity being queried for both values. You can find more info in the Datomic docs.

Find entity id of last transaction in Datomic?

I'd like to know how to I can find the id of the entity that was modified/created/deleted by the latest transaction in Datomic. How can I do this?
For this kind of read pattern (temporal-based), you'll want to use the Log API. Note that:
there may be more than one entity that was affected by the last transaction.
actually, the transaction itself is represented by an entity which is created for that transaction, which you may want to filter out of your result.
Here's a sample implementation:
(defn affected-entities
"Given a Datomic connection, returns the set of entity ids that were affected
by the last transaction (in e position), excluding the entity representing the
transaction itself."
[conn]
(let [db (d/db conn)]
(->>
(d/q '[:find [?e ...] :in ?log ?t1 ?t2 :where
[(tx-ids ?log ?t1 ?t2) [?tx ...]] ;; binds the last tx
[(tx-data ?log ?tx) [[?e]]]]
(d/log conn) (d/basis-t db) (d/next-t db))
;; filtering out the transaction entity
(remove (fn [eid]
(->> eid d/part (d/ident db) (= :db.part/tx))))
set)))

Datomic: How do I query across any number of database inside of a query?

I'm using Datomic and would like to pull entire entities from any number of points in time based on my query. The Datomic docs have some decent examples about how I can perform queries from two different database instances if I know those instances before the query is performed. However, I'd like my query to determine the number of "as-of" type database instances I need and then use those instances when pulling the entities. Here's what I have so far:
(defn pull-entities-at-change-points [entity-id]
(->>
(d/q
'[:find ?tx (pull ?dbs ?client [*])
:in $ [?dbs ...] ?client
:where
[?client ?attr-id ?value ?tx true]
[(datomic.api/ident $ ?attr-id) ?attr]
[(contains? #{:client/attr1 :client/attr2 :client/attr3} ?attr)]
[(datomic.api/tx->t ?tx) ?t]
[?tx :db/txInstant ?inst]]
(d/history (d/db db/conn))
(map #(d/as-of (d/db db/conn) %) [1009 1018])
entity-id)
(sort-by first)))
I'm trying to find all transactions wherein certain attributes on a :client entity changed and then pull the entity as it existed at those points in time. The line: (map #(d/as-of (d/db db/conn) %) [1009 1018]) is my attempt to created a sequence of database instances at two specific transactions where I know the client's attributes changed. Ideally, I'd like to do all of this in one query, but I'm not sure if that's possible.
Hopefully this makes sense, but let me know if you need more details.
I would split out the pull calls to be separate API calls instead of using them in the query. I would keep the query itself limited to getting the transactions of interest. One example solution for approaching this would be:
(defn pull-entities-at-change-points
[db eid]
(let
[hdb (d/history db)
txs (d/q '[:find [?tx ...]
:in $ [?attr ...] ?eid
:where
[?eid ?attr _ ?tx true]]
hdb
[:person/firstName :person/friends]
eid)
as-of-dbs (map #(d/as-of db %) txs)
pull-w-t (fn [as-of-db]
[(d/as-of-t as-of-db)
(d/pull as-of-db '[*] eid)])]
(map pull-w-t as-of-dbs)))
This function against a db I built with a toy schema would return results like:
([1010
{:db/id 17592186045418
:person/firstName "Gerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}]
[1001
{:db/id 17592186045418
:person/firstName "Jerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}])
A few points I'll comment on:
the function above takes a database value instead of getting databases from the ambient/global conn.
we map pull for each of various time t's.
using the pull API as an entry point rather than query is appropriate for cases where we have the entity and other information on hand and just want attributes or to traverse references.
the impetus to get everything done in one big query doesn't really exist in Datomic since the relevant segments will have been realized in the peer's cache. You're not, i.e., saving a round trip in using one query.
the collection binding form is preferred over contains and leverages query caching.

Mapping a list of datomic ids to entity maps

When I query for a list of Datomic entities, e.g like in the example below:
'[:find ?e
:where
[?e :category/name]]
Usually, I'd like to create a list of maps that represent the full entities, i.e
#{[1234] [2223]} => [{:category/name "x" :db/id 1234}, {:category/name "y" :db/id 2223}]
Here is my approach at the moment, in the form of a helper function.
(defn- db-ids->entity-maps
"Takes a list of datomic entity ids retrieves and returns
a list of hydrated entities in the form of a list of maps."
[db-conn db-ids]
(->>
db-ids
seq
flatten
(map #(->>
%
;; id -> lazy entity map
(d/entity (d/db db-conn))
;; realize all values, except for db/id
d/touch
(into {:db/id %})))))
Is there a better way?
With the pull api, this is pretty easy now.
'[:find [(pull ?e [*]) ...]
:in $ [[?e] ...]
:where [?e]]
I used to take this approach to save queries to the DB, the code is probably less reusable but it depends on what is more critical in your current scenario. I haven't a Datomic instance configured as I am not working with it right now so it may contain syntax error but I hope you get the idea.
(def query-result '[:find ?cat-name ?id
:where
[?cat-name :category/name
[?id :db/id]])
=>
#{["x" 1234] ["x" 2223]}
(defn- describe-values
"Adds proper keys to the given values."
[keys-vec query-result]
(vec (map #(zipmap keys-vec %) query-result))
(describe-values [:category/name :db/id] query-result)
=>
[{:db/id 2223, :category/name "x"} {:db/id 1234, :category/name "x"}]

Datomic query: find all entities with some value

With this query:
{:find [?e]
:where [[?e :db/valueType :db.type/string]]}
I can find all entities with property named :db/valueType and value of :db.type/string. In my case with some data in the database it returns ten IDs.
How would I search for all entities with value of :db.type/string, regardless of the property name? For example this query:
{:find [?e]
:where [[?e _ :db.type/string]]}
returns an empty set. As far as I can understand Datomic's Datalog, _ should work as a wildcard, matching anything, so the second query should at least return the same number of results as the first one, and maybe even more than that.
Thanks...
For this example, the logical structure of the query is essentially correct, but the attribute ident keyword is not being resolved to its entity id. Note that this is a special case that occurs when you query with attributes as inputs - a scenario in which the query engine is not guaranteed to perform this conversion. See the Datomic docs on query (http://docs.datomic.com/query.html) for "Attributes as Query Inputs."
A way to restructure this query is as:
(let [db (d/db conn)]
(d/q '[:find ?e
:in $ ?id
:where [?e _ ?id]]
db (d/entid db :db.type/string)))
In this case, we resolve the keyword :db.type/string to its entity id manually in the input to the parameterized query.