I want to use Datomic partitions to improve the scalability of my app.
First, I first created a partition in a transaction :
{:db/id "communities"
:db/ident :communities}
[:db/add :db.part/db :db.install/partition "communities"]
Second, I created the database schema in another transaction:
{:db/ident :person/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
{:db/ident :person/age
:db/valueType :db.type/int
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
{:db/ident :person/sibblings
:db/valueType :db.type/int
:db/cardinality :db.cardinality/one
:db.install/_attribute :communities}
And here is an example of a simple query:
(d/q '[:find ?name ?age
:where
[?p :person/name ?name]
[?p :person/age ?age]]
db)
When I issue this query, I get the following error:
Unhandled datomic.impl.Exceptions$IllegalArgumentExceptionInfo :db.error/not-an-entity Unable to resolve entity: :person/name {:db/error :db.error/not-an-entity}
When I replace :db.install/_attribute :communities by :db.install/_attribute :db.part/db for every attribute, the query works fine. I have the same problem for all my other queries.
Am I missing something ?
User-defined partitions are not alternative "linkage points" for user-defined attributes. Attributes are always linked to the :db.part/db entity via the :db.install/attribute attribute1 and they always live in the :db.part/db partition.
Partitions allow you to ensure that datoms related to certain entities will occur close together in Datomic's indices by ensuring that their entity ids are allocated in a certain range. If your app is likely to access entities residing in a certain partition together, this may improve performance of your queries. It is also possible to walk datoms related to entities residing in a certain partition without encountering unrelated datoms (using seek-datoms + EAVT + entid-at).
The way to use a partition once it's defined is to pass its :db/ident to datomic.api/tempid when creating new entities (or you can use it with #db/id tagged literals in edn files):
#(d/transact connection
{:db/id (d/tempid :some-partition)
… …})
If you subsequently issue a query that happens to fetch some EAVT blocks related to this entity, those blocks will likely include information about other entities in the same partition (unless the entity has a huge enough number of attributes asserted on it to fill a block, I suppose… but even then you would have fetched some relevant nodes in the index tree), so your peer will be able to retrieve information about them from cache.
If you expect to see major benefits from this type of locality, then partitions may be worth looking into. If you're not sure your app would benefit, that's totally fine. Apparently this feature isn't used that much in the wild, indeed the new(-ish) client API currently doesn't expose it at all.
1 Although note that the explicit :db.install/_attribute :db.part/db line has been unnecessary since 0.9.5530: Datomic now deduces that a newly introduced entity is to be an attribute from the presence of the "attribute attributes" :db/ident, :db/valueType and :db/cardinality – this is possible because the latter two are exclusively used by attributes – and adds the :db.install/attribute linkage for you.
Related
I have a bunch of entities that are special, but they are not part of the db schema. Since these entities are special I set some :db/ident attributes to them to have easy access to them in my programs.
Lets say I call one of these accounts :base-account Now the problem is that when I use entity api to access these entities I have this problem:
;; access some entity that references one of the special entities
> (d/touch (d/entity db 12345678))
==>
{:transaction/amount 22334455,
:transaction/from {:db/id 0987654}, ;; normal reference to an entity
:transaction/to :base-account} ;; this is a reference to a special account with a :db/ident attribute
This causes me problems in some of the code I have written before, because this will not give me the details of the :transaction/to account.
So to solve this problem I removed the :db/ident attributes from these entities:
> (d/transact connection [[:db/retract id-of-the-special-account
:db/ident :base-account]])
Which successfully removes the :db/ident from the entity:
> (:db/ident (d/entity db id-of-the-special-account))
==> nil
But for some reason (maybe a bug), the entity api call still refers to it with its old identity:
> (d/entity db :base-account) ;; should not work
==> {:db/id id-of-the-special-account}
So how can I remove the identity from these entities without having to remove them from the database altogether? Or maybe a way to fix the way the (d/entity ....) call works, in a sane way?
EDIT: I'm using datomic-pro-5544
From the Datomic Docs:
Idents should be used for two purposes: to name schema entities and to
implement enumerated tags. Both of these uses are demonstrated in the
introductory tutorial. To support these usages, idents have two
special characteristics:
Idents are designed to be extremely fast and always available. All idents associated with a database are stored in memory in every
Datomic transactor and peer.
When you navigate the entity API to a reference that has an ident, the lookup will return the ident, not another entity.
That last bullet point might be what is affecting you.
Next paragraph:
These characteristics also imply situations where idents should not be
used:
Idents should not be used as unique names or ids on ordinary domain entities. Such entity names should be implemented with a
domain-specific attribute that is a unique identity.
Idents should not be used as names for test data. (Your real data will not have such names, and you don't want test data to behave
differently than the real data it simulates.)
From this it seems like you might want to re-design the DB, rather than trying to un-do your use of :db/ident.
I have a function that takes in list of entry and save it to mongo using monger.
What is strange is that only the one record will be updated and the rest ignored unless I specify multi:true.
I don't understand why the multi flag is necessary for monger to persist all the updates to mongodb.
(defn update-entries
[entries]
(let [conn (mg/connect)
db (mg/get-db conn "database")]
(for [e entries] (mc/update db "posts" {"id" (:id e)} {$set {:data (:data e)}} {:multi true}))))
The multi flag is necessary for multi updates, since that's what mongo itself uses. Take a look at documentation for update. Granted, that's mongo shell, but most drivers try to follow when it comes to operation semantics.
Note that if "id" is unique, then you're updating one record at a time so having :multi set to true shouldn't matter.
There is, however, another issue with your code.
You use a for comprehension, which in turn iterates a collection lazily, i.e. calls to mc/update won't be made until you force the realization of the collection returned by for.
Since mc/update is a call made for it's side-effects (update a record in the db), using doseq would be more apropriate, unless you need the results.
If that's the case, wrap for in doall to force realization:
(doall
(for [e entries]
(mc/update db "posts" {"id" (:id e)} {$set {:data (:data e)}} {:multi true})))))
I'm writing a Clojure programme to help me perform a security risk assessment (finally gotten fed-up with Excel).
I have a question on Clojure idiom and style.
To create a new record about an asset in a risk assessment I pass in the risk-assessment I'm currently working with (a map) and a bunch of information about the asset and my make-asset function creates the asset, adds it to the R-A and returns the new R-A.
(defn make-asset
"Makes a new asset, adds it to the given risk assessment
and returns the new risk assessment."
[risk-assessment name description owner categories
& {:keys [author notes confidentiality integrity availability]
:or {author "" notes "" confidentiality 3 integrity 3 availability 3}}]
(let [ia-ref (inc (risk-assessment :current-ia-ref))]
(assoc risk-assessment
:current-ia-ref ia-ref
:assets (conj (risk-assessment :assets)
{:ia-ref ia-ref
:name name
:desc description
:owner owner
:categories categories
:author author
:notes notes
:confidentiality confidentiality
:integrity integrity
:availability availability
:vulns []}))))
Does this look like a sensible way of going about it?
Could I make it more idiomatic, shorter, simpler?
Particular things I am thinking about are:
should make-asset add the asset to the risk-assessment? (An asset is meaningless outside of a risk assessment).
is there a simpler way of creating the asset; and
adding it to the risk-assessment?
Thank you
A few suggestions, which may or may not apply.
The Clojure idiom for no value is nil. Use it.
Present the asset as a flat map. The mixture of position and
keyword arguments is confusing and vulnerable to changes in what
makes an asset valid.
As #Symfrog suggests, separate the validation of the asset from its
association with a risk assessment.
Don't bother to keep :current-ia-ref as an entry in a risk
assessment. It is just an asset count.
Pull out the default entries for an asset into a map in plain sight.
You can change your assumed defaults as you wish.
This gives us something like the following (untested):
(def asset-defaults {:confidentiality 3, :integrity 3, :availability 3})
(defn asset-valid? [asset] (every? asset [:name :description :owner]))
(defn add-asset [risk-assessment asset]
(if (asset-valid? asset)
(update-in
risk-assessment
[:assets]
conj (assoc
(merge asset asset-defaults)
:ia-ref (inc (count (:assets risk-assessment)))
:vulns []))))
Responses to Comments
:current-ia-ref isn't a count. If an asset is deleted it shouldn't reduce :current-is-ref.
Then (4) does not apply.
I'm not sure of the pertinence of your statement that the Clojure idiom for no value is nil. Could explain further in this context please?
Quoting Differences with other Lisps: In Clojure nil means 'nothing'. It signifies the absence of a value, of any type, and is not specific to lists or sequences.
In this case, we needn't give :author or :notes empty string values.
'Flat map': are you talking about the arguments into the function, if so then I agree.
Yes.
I'm not sure why you define an asset-valid? function. That seems to exceed the original need somewhat: and personally I prefer ensuring only valid assets can be created rather than checking after the fact.
Your add-asset function uses the structure of its argument list to make sure that risk-assessment, name, description, owner, and categories are present (I forgot to check for categories). If you move to presenting the data as a map - whether as a single argument or by destructuring - you lose this constraint. So you have to check the data explicitly (whether to do so in a separate function is moot). But there are benefits:
You can check for more than the presence of certain arguments.
You don't have to remember what order the arguments are in.
Wouldn't your version mean that if I decided to make asset a record in future I'd have to change all the code that called add-asset?
No. A record behaves as a map - it implements IPersistentMap. You'd have to change make-asset, obviously.
... whereas my approach the details of what an asset is is hidden?
In what sense are the contents of an asset hidden? An asset is a map required to have particular keys, and likely to have several other particular keys. Whether the asset is 'really' a record doesn't matter.
A core principle of Clojure (and any other Lisp dialect) is to create small composable functions.
It is not a problem if an asset is created outside of a risk assessment as long as the asset is not exposed to code that is expecting a fully formed asset before it has been added to a risk assessment.
So I would suggest the following (untested):
(defn add-asset-ra
[{:keys [current-ia-ref] :as risk-assessment} asset]
(let [ia-ref (if current-ia-ref
(inc current-ia-ref)
1)]
(-> risk-assessment
(assoc :current-ia-ref ia-ref)
(update-in [:assets] #(conj % (assoc asset :ia-ref ia-ref))))))
(defn make-asset
[name description owner categories
& {:keys [author notes confidentiality integrity availability]
:or {author "" notes "" confidentiality 3 integrity 3 availability 3}}]
{:name name
:desc description
:owner owner
:categories categories
:author author
:notes notes
:confidentiality confidentiality
:integrity integrity
:availability availability
:vulns []})
You may also find the Schema library useful to validate the shape of function arguments.
I'm surprised to find that query results in datomic are not lazy, when entities are.
Is there an obvious rationale for this choice that I am missing? It seems reasonable that someone might want to want to (map some-fn (take 100 query-result-containing-millions)), but this would force the evaluation of the entire set of entity-ids, no?
Is there a way to get a lazy seq (of entity-ids) directly back from the query, or do they always have to be loaded into memory first, with laziness only available through the entity?
You can use the datomic.api/datoms fn to get access to entities in a lazy way.
Note that you have to specify the index type when calling datoms and the types of indexes available to you depends on the type of the attribute that you're interested in. Eg the :avet index is only available if your attribute has :db/index set in the schema, and the :vaet index is only available if your attribute is of type :db.type/ref.
We use something like this at work (note: the attribute, ref-attr, must be of :db.type/ref for this to work):
(defn datoms-by-ref-value
"Returns a lazy seq of all the datoms in the database matching the
given reference attribute value."
[db ref-attr value]
(d/datoms db :vaet value ref-attr))
The datoms documentation is a bit sparse, but with some trial an error you can probably work out what you need. There's a post by August Lilleaas about using the :avet index (which requires an index on the attribute in the datomic schema) that I found somewhat helpful.
My question is - does Dataomic require the explicit manual creation of unique sequence numbers by the end user? Or is it just the example provided?
I'm reading through the Datomic tutorial.
When I look at the data that gets loaded in seattle-data0.dtm I see on the first two lines:
[
{:district/region :region/e, :db/id #db/id[:db.part/user -1000001], :district/name "East"}
{:db/id #db/id[:db.part/user -1000002], :neighborhood/name "Capitol Hill", :neighborhood/district #db/id[:db.part/user -1000001]}
Notice in particular the values
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
Perhaps you can help me understand - this appears to explicitly require a manually generated unique ID sequence number when preparing data for insert.
Surely in a modern database we can rely on the database to generate sequence numbers for us?
When I go to do my own example schema and data insert - I find that I am required to insert manual ID numbers as well. What am I missing?
To answer your question: No Datomic doesn't require the end user to generate identifiers. What you see in the seattle example are temporary ids.
Every time you want to add some facts about new entities to Datomic, you have to give every new entity a temporary id. This id will be replaced with a real unique id by Datomic.
Now you may ask yourself why do you have to use this temporary ids in the first place? Temporary ids are needed to express relationships between all new entities in one single transaction. In your example, you have the ids:
:db/id #db/id[:db.part/user -1000001],
:db/id #db/id[:db.part/user -1000002]
#db/id[:db.part/user -1000001]
two of them are the same (I'll explain the negative numbers in a moment). That means that the new entity marked with the temporary id #db/id[:db.part/user -1000001] is the same in both assertions.
Now I have to explain the data literal (other link) #db/id[:db.part/user -1000001]. #db/id is the tag for a Datomic temporary id. The tag is followed by a vector of two components :db.part/user and -1000001. The first part is the database partition and is mandatory. The second part is optional. If you write just #db/id[:db.part/user], you get a fresh (different) temporary id every time this literal occurs. If you write #db/id[:db.part/user -1000001] you get the same temporary id every time you use the negative index -1000001. So #db/id[:db.part/user -1000001] is different to #db/id[:db.part/user -1000002].
I don't exactly know why the examples use indices below 1000000. The JavaDoc of tempid where the data literal of #db/id resolves to, says, that the numbers from -1 (inclusive) to -1000000 (exclusive) are reserved for user-created temp ids. So maybe someone can shed some light on this.
To sum this up: #db/id[...] are temporary ids to express same entities in one transaction and are replaced by real unique ids by Datomic at the end of the transaction. If you don't have to refer to the same entity in a transaction twice, you are fine with just #db/id[:db.part/user] for every temporary id.