Datomic - requires explicit manual coding of unique IDs? - clojure

My question is - does Dataomic require the explicit manual creation of unique sequence numbers by the end user? Or is it just the example provided?
I'm reading through the Datomic tutorial.
When I look at the data that gets loaded in seattle-data0.dtm I see on the first two lines:
[
{:district/region :region/e, :db/id #db/id[:db.part/user -1000001], :district/name "East"}
{:db/id #db/id[:db.part/user -1000002], :neighborhood/name "Capitol Hill", :neighborhood/district #db/id[:db.part/user -1000001]}
Notice in particular the values
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
Perhaps you can help me understand - this appears to explicitly require a manually generated unique ID sequence number when preparing data for insert.
Surely in a modern database we can rely on the database to generate sequence numbers for us?
When I go to do my own example schema and data insert - I find that I am required to insert manual ID numbers as well. What am I missing?

To answer your question: No Datomic doesn't require the end user to generate identifiers. What you see in the seattle example are temporary ids.
Every time you want to add some facts about new entities to Datomic, you have to give every new entity a temporary id. This id will be replaced with a real unique id by Datomic.
Now you may ask yourself why do you have to use this temporary ids in the first place? Temporary ids are needed to express relationships between all new entities in one single transaction. In your example, you have the ids:
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
two of them are the same (I'll explain the negative numbers in a moment). That means that the new entity marked with the temporary id #db/id[:db.part/user -1000001] is the same in both assertions.
Now I have to explain the data literal (other link) #db/id[:db.part/user -1000001]. #db/id is the tag for a Datomic temporary id. The tag is followed by a vector of two components :db.part/user and -1000001. The first part is the database partition and is mandatory. The second part is optional. If you write just #db/id[:db.part/user], you get a fresh (different) temporary id every time this literal occurs. If you write #db/id[:db.part/user -1000001] you get the same temporary id every time you use the negative index -1000001. So #db/id[:db.part/user -1000001] is different to #db/id[:db.part/user -1000002].
I don't exactly know why the examples use indices below 1000000. The JavaDoc of tempid where the data literal of #db/id resolves to, says, that the numbers from -1 (inclusive) to -1000000 (exclusive) are reserved for user-created temp ids. So maybe someone can shed some light on this.
To sum this up: #db/id[...] are temporary ids to express same entities in one transaction and are replaced by real unique ids by Datomic at the end of the transaction. If you don't have to refer to the same entity in a transaction twice, you are fine with just #db/id[:db.part/user] for every temporary id.

Related

When to use Datomic Upsert?

I'm referring to :db/unique :db.unique/identity
(as opposed to :db.unique/value)
Upsert to me in my naivety sounds a bit scary, because if i try to insert a new record which has the same value (for a field declared unique) for an existing record, my feeling is that I want that to fail.
What I'd do if a field is declared unique is, when creating or updating another record, check whether that value is taken, and if it is give the user feedback that it's taken.
What am i missing about upsert and when/why is it useful?
A search revealed the following (not in relation to datomic)
"upsert helps avoid the creation of duplicate records and can save you time (if inserting a group) because you don't have to determine which records exist first".
What I dont' understand is that the datomic docs sometimes suggest using this, but I don't see why it's better in any way. Saving time at the cost of allowing collisions?
e.g., if I have a user registration system, I definitely do not want to "upsert" on the email the user submits. I can't think of a case when it would be useful - other than the one quoted above, if you had a large collection of things and you "didn't have time" to check whether they existed first.
Datomic property :db/unique can be used with :db.unique/identity or :db.unique/value
If it's used with :db.unique/identity, will upsert
If used with :db.unique/value, will conflict.

retracting :db/ident from an entity

I have a bunch of entities that are special, but they are not part of the db schema. Since these entities are special I set some :db/ident attributes to them to have easy access to them in my programs.
Lets say I call one of these accounts :base-account Now the problem is that when I use entity api to access these entities I have this problem:
;; access some entity that references one of the special entities
> (d/touch (d/entity db 12345678))
==>
{:transaction/amount 22334455,
:transaction/from {:db/id 0987654}, ;; normal reference to an entity
:transaction/to :base-account} ;; this is a reference to a special account with a :db/ident attribute
This causes me problems in some of the code I have written before, because this will not give me the details of the :transaction/to account.
So to solve this problem I removed the :db/ident attributes from these entities:
> (d/transact connection [[:db/retract id-of-the-special-account
:db/ident :base-account]])
Which successfully removes the :db/ident from the entity:
> (:db/ident (d/entity db id-of-the-special-account))
==> nil
But for some reason (maybe a bug), the entity api call still refers to it with its old identity:
> (d/entity db :base-account) ;; should not work
==> {:db/id id-of-the-special-account}
So how can I remove the identity from these entities without having to remove them from the database altogether? Or maybe a way to fix the way the (d/entity ....) call works, in a sane way?
EDIT: I'm using datomic-pro-5544
From the Datomic Docs:
Idents should be used for two purposes: to name schema entities and to
implement enumerated tags. Both of these uses are demonstrated in the
introductory tutorial. To support these usages, idents have two
special characteristics:
Idents are designed to be extremely fast and always available. All idents associated with a database are stored in memory in every
Datomic transactor and peer.
When you navigate the entity API to a reference that has an ident, the lookup will return the ident, not another entity.
That last bullet point might be what is affecting you.
Next paragraph:
These characteristics also imply situations where idents should not be
used:
Idents should not be used as unique names or ids on ordinary domain entities. Such entity names should be implemented with a
domain-specific attribute that is a unique identity.
Idents should not be used as names for test data. (Your real data will not have such names, and you don't want test data to behave
differently than the real data it simulates.)
From this it seems like you might want to re-design the DB, rather than trying to un-do your use of :db/ident.

How can I use datomic's pull method to grab an entity by its entity id?

How can I retrieve an entity using the pull method by its entity id? I've used transact to add some datoms/facts (right phrasing?) to my db. I can see the entity id's if I do a simple query like:
[:find ?e
:where
[?e :arb/value]
]
The result being:
{[17592186045418] [17592186045420] [17592186045423]}
Now I'd like to retrieve one of these using pull. The examples in the docs for pull, however, use examples where the entity in question is associated with an id.
Specifically, the docs refer to an example from the musicbrainz sample data set, and the sample they suggest is:
(pull db '[*] led-zeppelin)
where (although the docs don't show this) led-zeppelin has been defined like so (as can be seen here:
(def led-zeppelin [:artist/gid #uuid "678d88b2-87b0-403b-b63d-5da7465aecc3"])
The docs say that the pull command takes three things: a db, a selector pattern determining I think what attributes are pulled for each entity, and the "eid" of the entity. So the above led-zeppelin var is somehow the eid.
I don't really follow totally what's going on there. The :artist/gid is a id attribute defined in the schema for musicbrainz it seems, and the third item looks like the the specific id. I'm not sure what #uuid is.
But, in my case, I have defined no id attribute for my entities. I was hoping to be able to use the unique entity id that I think is assigned by default to each entity. Is this possible? If so, how would this be done?
The solution here is simple. Just drop in the entity id number directly:
(d/pull db '[*] 17592186045418)
The mistake I'd made was to use the eid as a string, i.e. by double-quoting it.
Pulls third argument is a reference to an entity. You can either use one of the IDs that your query returned, or a lookup ref, like in the led-zepplin example, where you refer to an entity using a unique attribute value.
The purpose of a query is to find the EID of something given one or more of it's properties. If you already know the EID, you don't need a query, you just want to retrieve the attr/val pairs for that entity. So use the entity function:
(let [eid 12345
result (into {} (d/entity db eid)) ]
(println result))
Note that the result of (d/entity ...) is lazy and you need to force it into a clojure map to see all of the items.
Besides Datomic's own documentation, you can find more examples and unit tests in the Tupelo Datomic library. Besides containing many convenience functions, the James Bond example helps to clarify some of the Datomic concepts.

Datomic queries and laziness

I'm surprised to find that query results in datomic are not lazy, when entities are.
Is there an obvious rationale for this choice that I am missing? It seems reasonable that someone might want to want to (map some-fn (take 100 query-result-containing-millions)), but this would force the evaluation of the entire set of entity-ids, no?
Is there a way to get a lazy seq (of entity-ids) directly back from the query, or do they always have to be loaded into memory first, with laziness only available through the entity?
You can use the datomic.api/datoms fn to get access to entities in a lazy way.
Note that you have to specify the index type when calling datoms and the types of indexes available to you depends on the type of the attribute that you're interested in. Eg the :avet index is only available if your attribute has :db/index set in the schema, and the :vaet index is only available if your attribute is of type :db.type/ref.
We use something like this at work (note: the attribute, ref-attr, must be of :db.type/ref for this to work):
(defn datoms-by-ref-value
"Returns a lazy seq of all the datoms in the database matching the
given reference attribute value."
[db ref-attr value]
(d/datoms db :vaet value ref-attr))
The datoms documentation is a bit sparse, but with some trial an error you can probably work out what you need. There's a post by August Lilleaas about using the :avet index (which requires an index on the attribute in the datomic schema) that I found somewhat helpful.

Field order on query MySQL from Clojure JDBC

When I query MySQL from Clojure (jdbc), the fields are not returned in the order specified in SELECT clause (I'm calling a stored procedure that does the select). Initially it seemed that the fields were returned in the reverse order, but this happens only if there are 1 to 9 fields. Adding the tenth field makes the result set to go in no particular order, altough it is always the same order for a particular number of fields in result set.
Anyone has observed it?
You can ask java.jdbc to return individual rows as vectors rather than maps (or arrays, in spite of the option name) by passing :as-arrays? true to query; field order will then be preserved:
;; checked with java.jdbc 0.3.0-alpha4
(query db [sql params...] :as-arrays? true)
Note that in this mode of operation an extra vector containing the keys corresponding to the column names (which would otherwise be used in the constructed maps) will be prepended to the seq of actual result vectors.
By default, java.jdbc returns rows as maps, as per Arthur's answer. These will be array maps up to 9 entries (maintaining insertion order) and hash maps beyond this threshold (with no useful ordering).
It is most likely that the fields are being returned in order and then subsequently being re-orderd by the data structure they are packed into by the clojure jdbc library. Assuming you are using clojure.java.jdbc the results are returned in a list of maps like this:
{:name "Apple" :appearance "rosy" :cost 24}
{:name "Orange" :appearance "round" :cost 49}
where each map is one row. The order of the rows will be preserved because they are presented in a list, though the order of the fields is not because they are presented in maps (which do not guarantee order. You could sort them afterwords if you need a particular order or call