In Datomic, querying field history with no retractions? - clojure

I'd like to get the history of values for a particular field in Datomic.
My intuition is to use (d/history) like
(d/q '[:find ?entity ?field-val ?date ?tx
:in $
:where
[?entity :namespace/field ?field-val ?tx]
[?tx :db/txInstant ?date]]
(d/history (db/get-db)))
However, this query will duplicate most values because it lists every retraction as well as every value update (every db/add and db/retract).
I thought maybe I could query the datoms with the transaction, then check the operations. But I can't find a way to query the datoms.
(d/pull db '[*] tx-id) doesn't include datoms.
search engine results were not helpful for keywords like "query datomic transaction datoms"
searching for datomic transaction schema is not fruitful
I can use tx-range, but that seems unweildly.
Any better approaches?

I was looking in the wrong place. History queries offer an extra hidden positional value described in the history docs.
So any where clause can include ?entity ?attribute ?value ?transaction ?operation.
?operation is true for :db/add and false for :db/retract
So, the query I want looks like
(d/q '[:find ?entity ?field-val ?date ?tx
:in $
:where
[?entity :namespace/field ?field-val ?tx true] ;;ADDED TRUE
[?tx :db/txInstant ?date]]
(d/history (db/get-db)))

Related

How to find only one record from query in datomic?

I'm doing a query on datomic using datomic.api like the following:
(d/q
'[:find [(pull ?a [*]) ...]
:in $ ?title
:where
[?a :movie/title ?title]]
db title)
This query is returning almost the expected value, but as an array, like this:
[ {:db/id 17592186045442, :movie/title "Test", :movie/year 1984, :movie/director #:db{:id 17592186045439 }} ]
I want this query to return only the first match, and not all the results. What I'm doing wrong?
I found a solution for my specific case. The real issue was that I was not understanding the datomic query correctly.
[:find [(pull ?a [*]) ...]
This part is telling datomic to retrieve more than one result.
I changed the query to the following one:
(d/q
'[:find (pull ?a [*]) .
:in $ ?title
:where
[?a :movie/title ?title]]
db title)
And it worked!
The key thing was to remove the "[" after :find keyword, and switch the "..." for only ".".
If this doesn't work for you, look on the link that #EugenePakhomov posted on the comments: Equivalent of SQL "limit" clause in Datomic
It is documented in the official Datomic documentation:
Find Spec
:find ?a ?b relation (Collection of Lists)
:find [?a …] collection (Collection)
:find [?a ?b] single tuple (List)
:find ?a . single scalar (Scalar Value)

Find oldest entity with a certain attribute that may have been retracted in Datomic?

I'd like to find the oldest entity that has an attribute called :app/type. The oldest entity might (or might not) have been retracted. How can I construct a query to find this?
You can use the d/history function to obtain a database in which you can query all additions and retractions across time.
I'm not entirely sure what you want to achieve, but this query returns the entity with the oldest transaction involving :app/type, the transaction id, and whether this entity was added or retracted.
(d/q '[:find ?e (min ?tx) ?added
:where
[?e :app/type ?v ?tx ?added]]
(d/history db))

Datomic: How do I query across any number of database inside of a query?

I'm using Datomic and would like to pull entire entities from any number of points in time based on my query. The Datomic docs have some decent examples about how I can perform queries from two different database instances if I know those instances before the query is performed. However, I'd like my query to determine the number of "as-of" type database instances I need and then use those instances when pulling the entities. Here's what I have so far:
(defn pull-entities-at-change-points [entity-id]
(->>
(d/q
'[:find ?tx (pull ?dbs ?client [*])
:in $ [?dbs ...] ?client
:where
[?client ?attr-id ?value ?tx true]
[(datomic.api/ident $ ?attr-id) ?attr]
[(contains? #{:client/attr1 :client/attr2 :client/attr3} ?attr)]
[(datomic.api/tx->t ?tx) ?t]
[?tx :db/txInstant ?inst]]
(d/history (d/db db/conn))
(map #(d/as-of (d/db db/conn) %) [1009 1018])
entity-id)
(sort-by first)))
I'm trying to find all transactions wherein certain attributes on a :client entity changed and then pull the entity as it existed at those points in time. The line: (map #(d/as-of (d/db db/conn) %) [1009 1018]) is my attempt to created a sequence of database instances at two specific transactions where I know the client's attributes changed. Ideally, I'd like to do all of this in one query, but I'm not sure if that's possible.
Hopefully this makes sense, but let me know if you need more details.
I would split out the pull calls to be separate API calls instead of using them in the query. I would keep the query itself limited to getting the transactions of interest. One example solution for approaching this would be:
(defn pull-entities-at-change-points
[db eid]
(let
[hdb (d/history db)
txs (d/q '[:find [?tx ...]
:in $ [?attr ...] ?eid
:where
[?eid ?attr _ ?tx true]]
hdb
[:person/firstName :person/friends]
eid)
as-of-dbs (map #(d/as-of db %) txs)
pull-w-t (fn [as-of-db]
[(d/as-of-t as-of-db)
(d/pull as-of-db '[*] eid)])]
(map pull-w-t as-of-dbs)))
This function against a db I built with a toy schema would return results like:
([1010
{:db/id 17592186045418
:person/firstName "Gerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}]
[1001
{:db/id 17592186045418
:person/firstName "Jerry"
:person/friends [{:db/id 17592186045419} {:db/id 17592186045420}]}])
A few points I'll comment on:
the function above takes a database value instead of getting databases from the ambient/global conn.
we map pull for each of various time t's.
using the pull API as an entry point rather than query is appropriate for cases where we have the entity and other information on hand and just want attributes or to traverse references.
the impetus to get everything done in one big query doesn't really exist in Datomic since the relevant segments will have been realized in the peer's cache. You're not, i.e., saving a round trip in using one query.
the collection binding form is preferred over contains and leverages query caching.

Efficient Datomic query to perform filtering on paginated sets

Given that Datomic does not support pagination I'm wondering how to efficiently support a query such as:
Take the first 30 entities on :history/body, find entities whose
:history/body matches some regex.
Here's how I'd do regex matching alone:
{:find [?e]
:where [[?e :history/body ?body]
[(re-find #"foo.*bar$" ?body)]]}
Observations:
I could then (take ...) from those, but that is not the same as matching against the first 30 entities.
I could get all entities, take 30 then manually filter with re-find, but if I have 30M entities, getting all of them just to take 30 seems wildly inefficient. Additionally: what if I wanted to take 20M out of my 30M entities and filter them via re-find?
Datomic docs talk about how queries are executed locally, but I've tried doing in-memory transformations on a set of 52913 entities (granted, they're fully touched) and it takes ~5 seconds. Imagine how bad it'd be in the millions or 10s of millions.
(Just brainstorming, here)
First of all, if you're ever using regexp, you may want to consider a fulltext index on :history/body so that you can do:
[(fulltext $ :history/body "foo*bar") [[?e]]]
(Note: You can't change :db/fulltext true/false on an existing entity schema)
Sorting is something you have to do outside the query. But depending on your data, you may be able to constrain your query to a single "page" and then apply your predicate to just those entities.
For example, if we were only paginating :history entities by an auto-incrementing :history/id, then we'd know beforehand that "Page 3" is :history/id 61 to 90.
[:find ?e
:in $ ?min-id ?max-id
:where
[?e :history/id ?id]
(<= ?min-id ?id ?max-id)
(fulltext $ :history/body "foo*bar") [[?e]]]
Maybe something like this:
(defn get-filtered-history-page [page-n match]
(let [per-page 30
min-id (inc (* (dec page-n) per-page))
max-id (+ min-id per-page)]
(d/q '[:find ?e
:in $ ?min-id ?max-id ?match
:where
[?e :history/id ?id]
[(<= ?min-id ?id ?max-id)]
[(fulltext $ :history/body ?match) [[?e]]]]
(get-db) min-id max-id match)))
But, of course, the problem is that constraining the paginated set is usually based on an ordering you don't know in advance, so this isn't very helpful.

Datomic query: find all entities with some value

With this query:
{:find [?e]
:where [[?e :db/valueType :db.type/string]]}
I can find all entities with property named :db/valueType and value of :db.type/string. In my case with some data in the database it returns ten IDs.
How would I search for all entities with value of :db.type/string, regardless of the property name? For example this query:
{:find [?e]
:where [[?e _ :db.type/string]]}
returns an empty set. As far as I can understand Datomic's Datalog, _ should work as a wildcard, matching anything, so the second query should at least return the same number of results as the first one, and maybe even more than that.
Thanks...
For this example, the logical structure of the query is essentially correct, but the attribute ident keyword is not being resolved to its entity id. Note that this is a special case that occurs when you query with attributes as inputs - a scenario in which the query engine is not guaranteed to perform this conversion. See the Datomic docs on query (http://docs.datomic.com/query.html) for "Attributes as Query Inputs."
A way to restructure this query is as:
(let [db (d/db conn)]
(d/q '[:find ?e
:in $ ?id
:where [?e _ ?id]]
db (d/entid db :db.type/string)))
In this case, we resolve the keyword :db.type/string to its entity id manually in the input to the parameterized query.