retracting :db/ident from an entity - clojure

I have a bunch of entities that are special, but they are not part of the db schema. Since these entities are special I set some :db/ident attributes to them to have easy access to them in my programs.
Lets say I call one of these accounts :base-account Now the problem is that when I use entity api to access these entities I have this problem:
;; access some entity that references one of the special entities
> (d/touch (d/entity db 12345678))
==>
{:transaction/amount 22334455,
:transaction/from {:db/id 0987654}, ;; normal reference to an entity
:transaction/to :base-account} ;; this is a reference to a special account with a :db/ident attribute
This causes me problems in some of the code I have written before, because this will not give me the details of the :transaction/to account.
So to solve this problem I removed the :db/ident attributes from these entities:
> (d/transact connection [[:db/retract id-of-the-special-account
:db/ident :base-account]])
Which successfully removes the :db/ident from the entity:
> (:db/ident (d/entity db id-of-the-special-account))
==> nil
But for some reason (maybe a bug), the entity api call still refers to it with its old identity:
> (d/entity db :base-account) ;; should not work
==> {:db/id id-of-the-special-account}
So how can I remove the identity from these entities without having to remove them from the database altogether? Or maybe a way to fix the way the (d/entity ....) call works, in a sane way?
EDIT: I'm using datomic-pro-5544

From the Datomic Docs:
Idents should be used for two purposes: to name schema entities and to
implement enumerated tags. Both of these uses are demonstrated in the
introductory tutorial. To support these usages, idents have two
special characteristics:
Idents are designed to be extremely fast and always available. All idents associated with a database are stored in memory in every
Datomic transactor and peer.
When you navigate the entity API to a reference that has an ident, the lookup will return the ident, not another entity.
That last bullet point might be what is affecting you.
Next paragraph:
These characteristics also imply situations where idents should not be
used:
Idents should not be used as unique names or ids on ordinary domain entities. Such entity names should be implemented with a
domain-specific attribute that is a unique identity.
Idents should not be used as names for test data. (Your real data will not have such names, and you don't want test data to behave
differently than the real data it simulates.)
From this it seems like you might want to re-design the DB, rather than trying to un-do your use of :db/ident.

Related

When to use Datomic Upsert?

I'm referring to :db/unique :db.unique/identity
(as opposed to :db.unique/value)
Upsert to me in my naivety sounds a bit scary, because if i try to insert a new record which has the same value (for a field declared unique) for an existing record, my feeling is that I want that to fail.
What I'd do if a field is declared unique is, when creating or updating another record, check whether that value is taken, and if it is give the user feedback that it's taken.
What am i missing about upsert and when/why is it useful?
A search revealed the following (not in relation to datomic)
"upsert helps avoid the creation of duplicate records and can save you time (if inserting a group) because you don't have to determine which records exist first".
What I dont' understand is that the datomic docs sometimes suggest using this, but I don't see why it's better in any way. Saving time at the cost of allowing collisions?
e.g., if I have a user registration system, I definitely do not want to "upsert" on the email the user submits. I can't think of a case when it would be useful - other than the one quoted above, if you had a large collection of things and you "didn't have time" to check whether they existed first.
Datomic property :db/unique can be used with :db.unique/identity or :db.unique/value
If it's used with :db.unique/identity, will upsert
If used with :db.unique/value, will conflict.

How to use Datomic partitions?

I want to use Datomic partitions to improve the scalability of my app.
First, I first created a partition in a transaction :
{:db/id "communities"
:db/ident :communities}
[:db/add :db.part/db :db.install/partition "communities"]
Second, I created the database schema in another transaction:
{:db/ident :person/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
{:db/ident :person/age
:db/valueType :db.type/int
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
{:db/ident :person/sibblings
:db/valueType :db.type/int
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
And here is an example of a simple query:
(d/q '[:find ?name ?age
:where
[?p :person/name ?name]
[?p :person/age ?age]]
db)
When I issue this query, I get the following error:
Unhandled datomic.impl.Exceptions$IllegalArgumentExceptionInfo :db.error/not-an-entity Unable to resolve entity: :person/name {:db/error :db.error/not-an-entity}
When I replace :db.install/_attribute :communities by :db.install/_attribute :db.part/db for every attribute, the query works fine. I have the same problem for all my other queries.
Am I missing something ?
User-defined partitions are not alternative "linkage points" for user-defined attributes. Attributes are always linked to the :db.part/db entity via the :db.install/attribute attribute1 and they always live in the :db.part/db partition.
Partitions allow you to ensure that datoms related to certain entities will occur close together in Datomic's indices by ensuring that their entity ids are allocated in a certain range. If your app is likely to access entities residing in a certain partition together, this may improve performance of your queries. It is also possible to walk datoms related to entities residing in a certain partition without encountering unrelated datoms (using seek-datoms + EAVT + entid-at).
The way to use a partition once it's defined is to pass its :db/ident to datomic.api/tempid when creating new entities (or you can use it with #db/id tagged literals in edn files):
#(d/transact connection
{:db/id (d/tempid :some-partition)
… …})
If you subsequently issue a query that happens to fetch some EAVT blocks related to this entity, those blocks will likely include information about other entities in the same partition (unless the entity has a huge enough number of attributes asserted on it to fill a block, I suppose… but even then you would have fetched some relevant nodes in the index tree), so your peer will be able to retrieve information about them from cache.
If you expect to see major benefits from this type of locality, then partitions may be worth looking into. If you're not sure your app would benefit, that's totally fine. Apparently this feature isn't used that much in the wild, indeed the new(-ish) client API currently doesn't expose it at all.
1 Although note that the explicit :db.install/_attribute :db.part/db line has been unnecessary since 0.9.5530: Datomic now deduces that a newly introduced entity is to be an attribute from the presence of the "attribute attributes" :db/ident, :db/valueType and :db/cardinality – this is possible because the latter two are exclusively used by attributes – and adds the :db.install/attribute linkage for you.

How can I use datomic's pull method to grab an entity by its entity id?

How can I retrieve an entity using the pull method by its entity id? I've used transact to add some datoms/facts (right phrasing?) to my db. I can see the entity id's if I do a simple query like:
[:find ?e
:where
[?e :arb/value]
]
The result being:
{[17592186045418] [17592186045420] [17592186045423]}
Now I'd like to retrieve one of these using pull. The examples in the docs for pull, however, use examples where the entity in question is associated with an id.
Specifically, the docs refer to an example from the musicbrainz sample data set, and the sample they suggest is:
(pull db '[*] led-zeppelin)
where (although the docs don't show this) led-zeppelin has been defined like so (as can be seen here:
(def led-zeppelin [:artist/gid #uuid "678d88b2-87b0-403b-b63d-5da7465aecc3"])
The docs say that the pull command takes three things: a db, a selector pattern determining I think what attributes are pulled for each entity, and the "eid" of the entity. So the above led-zeppelin var is somehow the eid.
I don't really follow totally what's going on there. The :artist/gid is a id attribute defined in the schema for musicbrainz it seems, and the third item looks like the the specific id. I'm not sure what #uuid is.
But, in my case, I have defined no id attribute for my entities. I was hoping to be able to use the unique entity id that I think is assigned by default to each entity. Is this possible? If so, how would this be done?
The solution here is simple. Just drop in the entity id number directly:
(d/pull db '[*] 17592186045418)
The mistake I'd made was to use the eid as a string, i.e. by double-quoting it.
Pulls third argument is a reference to an entity. You can either use one of the IDs that your query returned, or a lookup ref, like in the led-zepplin example, where you refer to an entity using a unique attribute value.
The purpose of a query is to find the EID of something given one or more of it's properties. If you already know the EID, you don't need a query, you just want to retrieve the attr/val pairs for that entity. So use the entity function:
(let [eid 12345
result (into {} (d/entity db eid)) ]
(println result))
Note that the result of (d/entity ...) is lazy and you need to force it into a clojure map to see all of the items.
Besides Datomic's own documentation, you can find more examples and unit tests in the Tupelo Datomic library. Besides containing many convenience functions, the James Bond example helps to clarify some of the Datomic concepts.

Datomic queries and laziness

I'm surprised to find that query results in datomic are not lazy, when entities are.
Is there an obvious rationale for this choice that I am missing? It seems reasonable that someone might want to want to (map some-fn (take 100 query-result-containing-millions)), but this would force the evaluation of the entire set of entity-ids, no?
Is there a way to get a lazy seq (of entity-ids) directly back from the query, or do they always have to be loaded into memory first, with laziness only available through the entity?
You can use the datomic.api/datoms fn to get access to entities in a lazy way.
Note that you have to specify the index type when calling datoms and the types of indexes available to you depends on the type of the attribute that you're interested in. Eg the :avet index is only available if your attribute has :db/index set in the schema, and the :vaet index is only available if your attribute is of type :db.type/ref.
We use something like this at work (note: the attribute, ref-attr, must be of :db.type/ref for this to work):
(defn datoms-by-ref-value
"Returns a lazy seq of all the datoms in the database matching the
given reference attribute value."
[db ref-attr value]
(d/datoms db :vaet value ref-attr))
The datoms documentation is a bit sparse, but with some trial an error you can probably work out what you need. There's a post by August Lilleaas about using the :avet index (which requires an index on the attribute in the datomic schema) that I found somewhat helpful.

Datomic - requires explicit manual coding of unique IDs?

My question is - does Dataomic require the explicit manual creation of unique sequence numbers by the end user? Or is it just the example provided?
I'm reading through the Datomic tutorial.
When I look at the data that gets loaded in seattle-data0.dtm I see on the first two lines:
[
{:district/region :region/e, :db/id #db/id[:db.part/user -1000001], :district/name "East"}
{:db/id #db/id[:db.part/user -1000002], :neighborhood/name "Capitol Hill", :neighborhood/district #db/id[:db.part/user -1000001]}
Notice in particular the values
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
Perhaps you can help me understand - this appears to explicitly require a manually generated unique ID sequence number when preparing data for insert.
Surely in a modern database we can rely on the database to generate sequence numbers for us?
When I go to do my own example schema and data insert - I find that I am required to insert manual ID numbers as well. What am I missing?
To answer your question: No Datomic doesn't require the end user to generate identifiers. What you see in the seattle example are temporary ids.
Every time you want to add some facts about new entities to Datomic, you have to give every new entity a temporary id. This id will be replaced with a real unique id by Datomic.
Now you may ask yourself why do you have to use this temporary ids in the first place? Temporary ids are needed to express relationships between all new entities in one single transaction. In your example, you have the ids:
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
two of them are the same (I'll explain the negative numbers in a moment). That means that the new entity marked with the temporary id #db/id[:db.part/user -1000001] is the same in both assertions.
Now I have to explain the data literal (other link) #db/id[:db.part/user -1000001]. #db/id is the tag for a Datomic temporary id. The tag is followed by a vector of two components :db.part/user and -1000001. The first part is the database partition and is mandatory. The second part is optional. If you write just #db/id[:db.part/user], you get a fresh (different) temporary id every time this literal occurs. If you write #db/id[:db.part/user -1000001] you get the same temporary id every time you use the negative index -1000001. So #db/id[:db.part/user -1000001] is different to #db/id[:db.part/user -1000002].
I don't exactly know why the examples use indices below 1000000. The JavaDoc of tempid where the data literal of #db/id resolves to, says, that the numbers from -1 (inclusive) to -1000000 (exclusive) are reserved for user-created temp ids. So maybe someone can shed some light on this.
To sum this up: #db/id[...] are temporary ids to express same entities in one transaction and are replaced by real unique ids by Datomic at the end of the transaction. If you don't have to refer to the same entity in a transaction twice, you are fine with just #db/id[:db.part/user] for every temporary id.