Mapping a list of datomic ids to entity maps - clojure

When I query for a list of Datomic entities, e.g like in the example below:
'[:find ?e
:where
[?e :category/name]]
Usually, I'd like to create a list of maps that represent the full entities, i.e
#{[1234] [2223]} => [{:category/name "x" :db/id 1234}, {:category/name "y" :db/id 2223}]
Here is my approach at the moment, in the form of a helper function.
(defn- db-ids->entity-maps
"Takes a list of datomic entity ids retrieves and returns
a list of hydrated entities in the form of a list of maps."
[db-conn db-ids]
(->>
db-ids
seq
flatten
(map #(->>
%
;; id -> lazy entity map
(d/entity (d/db db-conn))
;; realize all values, except for db/id
d/touch
(into {:db/id %})))))
Is there a better way?

With the pull api, this is pretty easy now.
'[:find [(pull ?e [*]) ...]
:in $ [[?e] ...]
:where [?e]]

I used to take this approach to save queries to the DB, the code is probably less reusable but it depends on what is more critical in your current scenario. I haven't a Datomic instance configured as I am not working with it right now so it may contain syntax error but I hope you get the idea.
(def query-result '[:find ?cat-name ?id
:where
[?cat-name :category/name
[?id :db/id]])
=>
#{["x" 1234] ["x" 2223]}
(defn- describe-values
"Adds proper keys to the given values."
[keys-vec query-result]
(vec (map #(zipmap keys-vec %) query-result))
(describe-values [:category/name :db/id] query-result)
=>
[{:db/id 2223, :category/name "x"} {:db/id 1234, :category/name "x"}]

Related

Error returning a Map instead of a Vector using Datomic

I'm doing some query in Datomic using Clojure, I'm trying to return a Map with keys instead of a Vector, if I don't try to return a Map with the ":keys" keyword in the query it works fine.
I tried to have equal and different names between the :find and :keys.
If I remove the :keys line bellow it works fine.
I'm using [org.clojure/clojure "1.10.0"] with [com.datomic/client-pro "0.8.28"].
(def get-links
'[:find ?e ?url ?description ?createdat ?order ?postedby
:keys e url description createdat order postedby
:in $ ?filter ?skip ?skip-plus-first
:where [?e :link/url ?url]
[?e :link/description ?description]
[?e :link/createdat ?createdat]
[?e :link/postedby ?e2]
[?e :link/order ?order]
[?e2 :user/name ?postedby]
[(.contains ?url ?filter)]
[(> ?order ?skip) ]
[(<= ?order ?skip-plus-first)]])
Here is how I'm calling it:
(d/q get-links db filter skip (+ first skip))
The exact error is:
Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:56).
"Argument :keys in :find is not a variable"
Below is Datomic examples, in their docs.
[:find ?artist-name ?release-name
:keys artist release
:where [?release :release/name ?release-name]
[?release :release/artists ?artist]
[?artist :artist/name ?artist-name]]
I think that you are using an older version of the client that doesn't know the :keys option yet.

Find entities whose ref-to-many attribute contains all elements of input

Suppose I have entity entry with ref-to-many attribute :entry/groups. How should I build a query to find entities whose :entry/groups attribute contains all of my input foreign ids?
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
TL;DR
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
3 strategies here:
Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
Use the indexes directly (EAVT or AVET).
Dynamic Datalog query
Datalog has no direct way of expressing dynamic conjunction (logical AND / 'for all ...' / set intersection). However, you can achieve it in pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negations, i.e (For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
This one has the advantage of accepting an empty ?vs collection (so long as ?e and ?a have been bound in some other way in the query).
Generating the query code
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v values)
(defn q-find-having-all-vs
[n-vs]
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
:where
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
:where
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
:where
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
:where
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))
You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 clauses, one for each desired value in the set:
; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set (td/find :let [$ (live-db)]
:find [?name]
:where {:person/name ?name :weapon/type :weapon/guile }
{:person/name ?name :weapon/type :weapon/gun } ) ]
(is (= #{["Dr No"] ["M"]} tuple-set )))
In pure Datomic it will look similar, but using something like the Entity ID:
[?eid :entry/groups 2]
[?eid :entry/groups 3]
and Datomic will perform an implicit AND operation (i.e. both clauses must match; any surplus entries are ignored). This is logically a "join" operation, even though it is the same entity being queried for both values. You can find more info in the Datomic docs.

How to construct a query that matches exactly a vector of refs in DataScript?

Setup Consider the following DataScript database of films and cast, with data stolen from learndatalogtoday.org: the following code can be executed in a JVM/Clojure REPL or a ClojureScript REPL, as long as project.clj contains [datascript "0.15.0"] as a dependency.
(ns user
(:require [datascript.core :as d]))
(def data
[["First Blood" ["Sylvester Stallone" "Brian Dennehy" "Richard Crenna"]]
["Terminator 2: Judgment Day" ["Linda Hamilton" "Arnold Schwarzenegger" "Edward Furlong" "Robert Patrick"]]
["The Terminator" ["Arnold Schwarzenegger" "Linda Hamilton" "Michael Biehn"]]
["Rambo III" ["Richard Crenna" "Sylvester Stallone" "Marc de Jonge"]]
["Predator 2" ["Gary Busey" "Danny Glover" "Ruben Blades"]]
["Lethal Weapon" ["Gary Busey" "Mel Gibson" "Danny Glover"]]
["Lethal Weapon 2" ["Mel Gibson" "Joe Pesci" "Danny Glover"]]
["Lethal Weapon 3" ["Joe Pesci" "Danny Glover" "Mel Gibson"]]
["Alien" ["Tom Skerritt" "Veronica Cartwright" "Sigourney Weaver"]]
["Aliens" ["Carrie Henn" "Sigourney Weaver" "Michael Biehn"]]
["Die Hard" ["Alan Rickman" "Bruce Willis" "Alexander Godunov"]]
["Rambo: First Blood Part II" ["Richard Crenna" "Sylvester Stallone" "Charles Napier"]]
["Commando" ["Arnold Schwarzenegger" "Alyssa Milano" "Rae Dawn Chong"]]
["Mad Max 2" ["Bruce Spence" "Mel Gibson" "Michael Preston"]]
["Mad Max" ["Joanne Samuel" "Steve Bisley" "Mel Gibson"]]
["RoboCop" ["Nancy Allen" "Peter Weller" "Ronny Cox"]]
["Braveheart" ["Sophie Marceau" "Mel Gibson"]]
["Mad Max Beyond Thunderdome" ["Mel Gibson" "Tina Turner"]]
["Predator" ["Carl Weathers" "Elpidia Carrillo" "Arnold Schwarzenegger"]]
["Terminator 3: Rise of the Machines" ["Nick Stahl" "Arnold Schwarzenegger" "Claire Danes"]]])
(def conn (d/create-conn {:film/cast {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}
:film/name {:db/unique :db.unique/identity
:db/cardinality :db.cardinality/one}
:actor/name {:db/unique :db.unique/identity
:db/cardinality :db.cardinality/one}}))
(def all-datoms (mapcat (fn [[film actors]]
(into [{:film/name film}]
(map #(hash-map :actor/name %) actors)))
data))
(def all-relations (mapv (fn [[film actors]]
{:db/id [:film/name film]
:film/cast (mapv #(vector :actor/name %) actors)}) data))
(d/transact! conn all-datoms)
(d/transact! conn all-relations)
Description In a nutshell, there are two kinds of entities in this database—films and actors (word intended to be ungendered)—and three kinds of datoms:
film entity: :film/name (a unique string)
film entity: :film/cast (multiple refs)
actor entity: :actor/name (unique string)
Question I would like to construct a query which asks: which films have these N actors, and these N actors alone, appeared as the sole stars, for N>=2?
E.g., RoboCop starred Nancy Allen, Peter Weller, Ronny Cox, but no film starred solely the first two of these, Allen and Weller. Therefore, I would expect the following query to produce the empty set:
(d/q '[:find ?film-name
:where
[?film :film/name ?film-name]
[?film :film/cast ?actor-1]
[?film :film/cast ?actor-2]
[?actor-1 :actor/name "Nancy Allen"]
[?actor-2 :actor/name "Peter Weller"]]
#conn)
; => #{["RoboCop"]}
However, the query is flawed because I don't know how to express that any matches should exclude any actors who are not Allen or Weller—again, I want to find the movies where only Allen and Weller have collaborated without any other actors, so I want to adapt the above query to produce the empty set. How can I adjust this query to enforce this requirement?
Because DataScript doesn't have negation (as of May 2016), I don't believe that's possible with one static query in 'pure' Datalog.
My way to go would be:
build the query programmatically to add the N clauses that state that the cast must contain the N actors
Add a predicate function which, given a movie, the database, and the set of actors ids, uses the EAVT index to find if each movie has an actor that is not in the set.
Here's a basic implementation
(defn only-those-actors? [db movie actors]
(->> (datoms db :eavt movie :film/cast) seq
(every? (fn [[_ _ actor]]
(contains? actors actor)))
))
(defn find-movies-with-exact-cast [db actors-names]
(let [actors (set (d/q '[:find [?actor ...] :in $ [?name ...] ?only-those-actors :where
[?actor :actor/name ?name]]
db actors-names))
query {:find '[[?movie ...]]
:in '[$ ?actors ?db]
:where
(concat
(for [actor actors]
['?movie :film/cast actor])
[['(only-those-actors? ?db ?movie ?actors)]])}]
(d/q query db actors db only-those-actors?)))
You can use predicate fun and d/entity together for filtering datoms by :film/cast field of an entity. This approach looks much more straightforward until Datascript doesn't support negation (not operator and so on).
Look at the row (= a (:age (d/entity db e)) in the test case of the Datascript here
[{:db/id 1 :name "Ivan" :age 10}
{:db/id 2 :name "Ivan" :age 20}
{:db/id 3 :name "Oleg" :age 10}
{:db/id 4 :name "Oleg" :age 20}]
...
(let [pred (fn [db e a]
(= a (:age (d/entity db e))))]
(is (= (q/q '[:find ?e
:in $ ?pred
:where [?e :age ?a]
[(?pred $ ?e 10)]]
db pred)
#{[1] [3]})))))
In your case, the predicate body could look something like this
(clojure.set/subset? actors (:film/cast (d/entity db e))
In regards to performance, the d/entity call is fast because it is a lookup by index.

Datomic: How do I query across any number of database inside of a query?

I'm using Datomic and would like to pull entire entities from any number of points in time based on my query. The Datomic docs have some decent examples about how I can perform queries from two different database instances if I know those instances before the query is performed. However, I'd like my query to determine the number of "as-of" type database instances I need and then use those instances when pulling the entities. Here's what I have so far:
(defn pull-entities-at-change-points [entity-id]
(->>
(d/q
'[:find ?tx (pull ?dbs ?client [*])
:in $ [?dbs ...] ?client
:where
[?client ?attr-id ?value ?tx true]
[(datomic.api/ident $ ?attr-id) ?attr]
[(contains? #{:client/attr1 :client/attr2 :client/attr3} ?attr)]
[(datomic.api/tx->t ?tx) ?t]
[?tx :db/txInstant ?inst]]
(d/history (d/db db/conn))
(map #(d/as-of (d/db db/conn) %) [1009 1018])
entity-id)
(sort-by first)))
I'm trying to find all transactions wherein certain attributes on a :client entity changed and then pull the entity as it existed at those points in time. The line: (map #(d/as-of (d/db db/conn) %) [1009 1018]) is my attempt to created a sequence of database instances at two specific transactions where I know the client's attributes changed. Ideally, I'd like to do all of this in one query, but I'm not sure if that's possible.
Hopefully this makes sense, but let me know if you need more details.
I would split out the pull calls to be separate API calls instead of using them in the query. I would keep the query itself limited to getting the transactions of interest. One example solution for approaching this would be:
(defn pull-entities-at-change-points
[db eid]
(let
[hdb (d/history db)
txs (d/q '[:find [?tx ...]
:in $ [?attr ...] ?eid
:where
[?eid ?attr _ ?tx true]]
hdb
[:person/firstName :person/friends]
eid)
as-of-dbs (map #(d/as-of db %) txs)
pull-w-t (fn [as-of-db]
[(d/as-of-t as-of-db)
(d/pull as-of-db '[*] eid)])]
(map pull-w-t as-of-dbs)))
This function against a db I built with a toy schema would return results like:
([1010
{:db/id 17592186045418
:person/firstName "Gerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}]
[1001
{:db/id 17592186045418
:person/firstName "Jerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}])
A few points I'll comment on:
the function above takes a database value instead of getting databases from the ambient/global conn.
we map pull for each of various time t's.
using the pull API as an entry point rather than query is appropriate for cases where we have the entity and other information on hand and just want attributes or to traverse references.
the impetus to get everything done in one big query doesn't really exist in Datomic since the relevant segments will have been realized in the peer's cache. You're not, i.e., saving a round trip in using one query.
the collection binding form is preferred over contains and leverages query caching.

Retrieve Most Recent Entity from Datomic

I'm interested in entities and their timestamps. Essentially, I want a time-sorted list of entities.
To that end, I've composed the following functions:
(defn return-posts
"grabs all posts from Datomic"
[]
(d/q '[:find ?title ?body ?slug
:where
[?e :post/title ?title]
[?e :post/slug ?slug]
[?e :post/body ?body]] (d/db connection)))
(defn get-postid-from-slug
[slug]
(d/q '[:find ?e
:in $ ?slug
:where [?e :post/slug ?slug]] (d/db connection) slug))
(defn get-post-timestamp
"given an entid, returns the most recent timestamp"
[entid]
(->
(d/q '[:find ?ts
:in $ ?e
:where
[?e _ _ _]
[?e :db/txInstant ?ts]] (d/db connection) entid)
(sort)
(reverse)
(first)))
Which I feel must be a hack rooted in ignorance.
Would someone more well-versed in idiomatic Datomic usage chime in and upgrade my paradigms?
I was bothered by the idea of adding additional timestamps to a database that nominally understands time as a first-class principle and so (after a night of mulling on the approaches outlined by Ulrik Sandberg) evolved the following function:
(defn return-posts
"grabs all posts from Datomic"
[uri]
(d/q '[:find ?title ?body ?slug ?ts
:where
[?e :post/title ?title ?tx]
[?e :post/slug ?slug]
[?e :post/body ?body]
[?tx :db/txInstant ?ts]] (d/db (d/connect uri))))
It's idiomatic in Datalog to omit the binding to the transaction ID itself as we typically don't care. In this situation, we very definitely care and in the words of August Lileaas, wish to "traverse the transaction" (there are situations in which we'd want the post creation time, but for this application the transaction time will suffice for ordering entities).
A notable downside to this approach is that recently edited entries will be bumped up in the list. To that end, I'll have to do something later on in order to get their "first appearance" in Datomic for blog-standard post history.
To summarize:
I have bound the transaction entity ID per "post" entity ID, and then looked up the transaction timestamp with this function for later sorting.
There isn't really a more elegant way to do this than traversing the transactions. This is why I prefer to have a separate domain specific attribute for timestamps, instead of relying on the transaction timestamps from Datomic. One example where this is necessary is merging: let's say you have a wiki, and you want to merge two wiki pages. In that case, you probably want to control the timestamp yourself, and not use the timestamp from the transaction.
I like to have the attributes :created-at and :changed-at. When I transact new entities:
[[:db/add tempid :post/slug "..."]
[:db/add tempid :post/title "A title"]
[:db/add tempid :created-at (java.util.Date.)]
[:db/add tempid :changed-at (java.util.Date.)]]
Then for updates:
[[:db/add post-eid :post/title "An updated title"]
[:db/add post-eid :changed-at (java.util.Date.)]]
That way all I have to do is to read out the :created-at attribute of the entity, which will be ready and waiting in the index.
(defmacro find-one-entity
"Returns entity when query matches, otherwise nil"
[q db & args]
`(when-let [eid# (ffirst (d/q ~q ~db ~#args))]
(d/entity ~db eid#)))
(defn find-post-by-slug
[db slug]
(find-one-entity
'[:find ?e
:in $ ?slug
:where
[?e :post/slug ?slug]]
db
slug))
;; Get timestamp
(:created-at (find-post-by-slug db "my-post-slug"))