I'm surprised to find that query results in datomic are not lazy, when entities are.
Is there an obvious rationale for this choice that I am missing? It seems reasonable that someone might want to want to (map some-fn (take 100 query-result-containing-millions)), but this would force the evaluation of the entire set of entity-ids, no?
Is there a way to get a lazy seq (of entity-ids) directly back from the query, or do they always have to be loaded into memory first, with laziness only available through the entity?
You can use the datomic.api/datoms fn to get access to entities in a lazy way.
Note that you have to specify the index type when calling datoms and the types of indexes available to you depends on the type of the attribute that you're interested in. Eg the :avet index is only available if your attribute has :db/index set in the schema, and the :vaet index is only available if your attribute is of type :db.type/ref.
We use something like this at work (note: the attribute, ref-attr, must be of :db.type/ref for this to work):
(defn datoms-by-ref-value
"Returns a lazy seq of all the datoms in the database matching the
given reference attribute value."
[db ref-attr value]
(d/datoms db :vaet value ref-attr))
The datoms documentation is a bit sparse, but with some trial an error you can probably work out what you need. There's a post by August Lilleaas about using the :avet index (which requires an index on the attribute in the datomic schema) that I found somewhat helpful.
Related
I'm referring to :db/unique :db.unique/identity
(as opposed to :db.unique/value)
Upsert to me in my naivety sounds a bit scary, because if i try to insert a new record which has the same value (for a field declared unique) for an existing record, my feeling is that I want that to fail.
What I'd do if a field is declared unique is, when creating or updating another record, check whether that value is taken, and if it is give the user feedback that it's taken.
What am i missing about upsert and when/why is it useful?
A search revealed the following (not in relation to datomic)
"upsert helps avoid the creation of duplicate records and can save you time (if inserting a group) because you don't have to determine which records exist first".
What I dont' understand is that the datomic docs sometimes suggest using this, but I don't see why it's better in any way. Saving time at the cost of allowing collisions?
e.g., if I have a user registration system, I definitely do not want to "upsert" on the email the user submits. I can't think of a case when it would be useful - other than the one quoted above, if you had a large collection of things and you "didn't have time" to check whether they existed first.
Datomic property :db/unique can be used with :db.unique/identity or :db.unique/value
If it's used with :db.unique/identity, will upsert
If used with :db.unique/value, will conflict.
How can I retrieve an entity using the pull method by its entity id? I've used transact to add some datoms/facts (right phrasing?) to my db. I can see the entity id's if I do a simple query like:
[:find ?e
:where
[?e :arb/value]
]
The result being:
{[17592186045418] [17592186045420] [17592186045423]}
Now I'd like to retrieve one of these using pull. The examples in the docs for pull, however, use examples where the entity in question is associated with an id.
Specifically, the docs refer to an example from the musicbrainz sample data set, and the sample they suggest is:
(pull db '[*] led-zeppelin)
where (although the docs don't show this) led-zeppelin has been defined like so (as can be seen here:
(def led-zeppelin [:artist/gid #uuid "678d88b2-87b0-403b-b63d-5da7465aecc3"])
The docs say that the pull command takes three things: a db, a selector pattern determining I think what attributes are pulled for each entity, and the "eid" of the entity. So the above led-zeppelin var is somehow the eid.
I don't really follow totally what's going on there. The :artist/gid is a id attribute defined in the schema for musicbrainz it seems, and the third item looks like the the specific id. I'm not sure what #uuid is.
But, in my case, I have defined no id attribute for my entities. I was hoping to be able to use the unique entity id that I think is assigned by default to each entity. Is this possible? If so, how would this be done?
The solution here is simple. Just drop in the entity id number directly:
(d/pull db '[*] 17592186045418)
The mistake I'd made was to use the eid as a string, i.e. by double-quoting it.
Pulls third argument is a reference to an entity. You can either use one of the IDs that your query returned, or a lookup ref, like in the led-zepplin example, where you refer to an entity using a unique attribute value.
The purpose of a query is to find the EID of something given one or more of it's properties. If you already know the EID, you don't need a query, you just want to retrieve the attr/val pairs for that entity. So use the entity function:
(let [eid 12345
result (into {} (d/entity db eid)) ]
(println result))
Note that the result of (d/entity ...) is lazy and you need to force it into a clojure map to see all of the items.
Besides Datomic's own documentation, you can find more examples and unit tests in the Tupelo Datomic library. Besides containing many convenience functions, the James Bond example helps to clarify some of the Datomic concepts.
I am using monger and fetching a batch from my mongo nosql database using find-maps. It returns an array that I plan to use as a datastore argument (reference) downstream in my chain of function calls. Within those future function calls, I will have access to a corresponding id. I hope to use this id as a lookup to fetch within my datastore so I don't have to make another monger call. A datastore in the form of an array doesn't seem like the fastest way to get access to the object by id .... but I am not certain.
If I needed to derive an object from this datastore array, then I'd need to use a function like this (that has to log(n) over every element)
(defn fetchObjFromArray [fetch_id inputarray]
(reduce (fn [reduced_obj element_obj]
(if (= fetch_id (get-in element_obj [:_id]))
element_obj ;; ignoring duplicates for conversation
reduced_obj
)
)
{}
inputarray
)
)
Instead, if after my initial monger call, I create a key/val hash object with a function like this:
(defn createReportFromObjArray [inputarray]
(reduce (fn [returnobj elementobj]
(let [_id (get-in elementobj [:_id])
keyword (keyword _id)]
(assoc returnobj keyword elementobj)
) ;; ignoring duplicates for conversation
)
{}
inputarray)
)
then perhaps my subsequent calls could instead use get-in and that would be much faster because I would be fetching by key?
I am confused because: when I use get-in, doesn't it have to iterate over each key in the key/val has object until it finds a match between the key and the fetch_id:
(let [report (createReportFromObjArray inputarray)
target_val (get-in report [(keyword fetch_id)])]
Why doesn't get-in have to log(n) over every key? Maybe its faster because it can stop when it finds the first "match" where map/reducing has to go the whole way through log(n)? How is this faster than having to iterate over each element in an array and checkin whether id matches fetch_id?
I am very grateful for help you can offer.
In your second code example you are building a Clojure hash map in linear time. Via get and derivations they have lookup performance of O(log32(N)).
In the first example you scan the entire input and return the last element that matched the ID or the empty hash map, probably unintentionally.
_
I recommend to use (group-by :_id) instead of the second code example. I also recommend to use (first (filter (comp #{fetch_id} :_id) inputarray)) in place of the first example.
Avoid casting to keywords via keyword - Clojure keywords should generally be known at compile time. Maps support arbirtrary data types as keys.
I have a function that takes in list of entry and save it to mongo using monger.
What is strange is that only the one record will be updated and the rest ignored unless I specify multi:true.
I don't understand why the multi flag is necessary for monger to persist all the updates to mongodb.
(defn update-entries
[entries]
(let [conn (mg/connect)
db (mg/get-db conn "database")]
(for [e entries] (mc/update db "posts" {"id" (:id e)} {$set {:data (:data e)}} {:multi true}))))
The multi flag is necessary for multi updates, since that's what mongo itself uses. Take a look at documentation for update. Granted, that's mongo shell, but most drivers try to follow when it comes to operation semantics.
Note that if "id" is unique, then you're updating one record at a time so having :multi set to true shouldn't matter.
There is, however, another issue with your code.
You use a for comprehension, which in turn iterates a collection lazily, i.e. calls to mc/update won't be made until you force the realization of the collection returned by for.
Since mc/update is a call made for it's side-effects (update a record in the db), using doseq would be more apropriate, unless you need the results.
If that's the case, wrap for in doall to force realization:
(doall
(for [e entries]
(mc/update db "posts" {"id" (:id e)} {$set {:data (:data e)}} {:multi true})))))
I'm writing a Clojure programme to help me perform a security risk assessment (finally gotten fed-up with Excel).
I have a question on Clojure idiom and style.
To create a new record about an asset in a risk assessment I pass in the risk-assessment I'm currently working with (a map) and a bunch of information about the asset and my make-asset function creates the asset, adds it to the R-A and returns the new R-A.
(defn make-asset
"Makes a new asset, adds it to the given risk assessment
and returns the new risk assessment."
[risk-assessment name description owner categories
& {:keys [author notes confidentiality integrity availability]
:or {author "" notes "" confidentiality 3 integrity 3 availability 3}}]
(let [ia-ref (inc (risk-assessment :current-ia-ref))]
(assoc risk-assessment
:current-ia-ref ia-ref
:assets (conj (risk-assessment :assets)
{:ia-ref ia-ref
:name name
:desc description
:owner owner
:categories categories
:author author
:notes notes
:confidentiality confidentiality
:integrity integrity
:availability availability
:vulns []}))))
Does this look like a sensible way of going about it?
Could I make it more idiomatic, shorter, simpler?
Particular things I am thinking about are:
should make-asset add the asset to the risk-assessment? (An asset is meaningless outside of a risk assessment).
is there a simpler way of creating the asset; and
adding it to the risk-assessment?
Thank you
A few suggestions, which may or may not apply.
The Clojure idiom for no value is nil. Use it.
Present the asset as a flat map. The mixture of position and
keyword arguments is confusing and vulnerable to changes in what
makes an asset valid.
As #Symfrog suggests, separate the validation of the asset from its
association with a risk assessment.
Don't bother to keep :current-ia-ref as an entry in a risk
assessment. It is just an asset count.
Pull out the default entries for an asset into a map in plain sight.
You can change your assumed defaults as you wish.
This gives us something like the following (untested):
(def asset-defaults {:confidentiality 3, :integrity 3, :availability 3})
(defn asset-valid? [asset] (every? asset [:name :description :owner]))
(defn add-asset [risk-assessment asset]
(if (asset-valid? asset)
(update-in
risk-assessment
[:assets]
conj (assoc
(merge asset asset-defaults)
:ia-ref (inc (count (:assets risk-assessment)))
:vulns []))))
Responses to Comments
:current-ia-ref isn't a count. If an asset is deleted it shouldn't reduce :current-is-ref.
Then (4) does not apply.
I'm not sure of the pertinence of your statement that the Clojure idiom for no value is nil. Could explain further in this context please?
Quoting Differences with other Lisps: In Clojure nil means 'nothing'. It signifies the absence of a value, of any type, and is not specific to lists or sequences.
In this case, we needn't give :author or :notes empty string values.
'Flat map': are you talking about the arguments into the function, if so then I agree.
Yes.
I'm not sure why you define an asset-valid? function. That seems to exceed the original need somewhat: and personally I prefer ensuring only valid assets can be created rather than checking after the fact.
Your add-asset function uses the structure of its argument list to make sure that risk-assessment, name, description, owner, and categories are present (I forgot to check for categories). If you move to presenting the data as a map - whether as a single argument or by destructuring - you lose this constraint. So you have to check the data explicitly (whether to do so in a separate function is moot). But there are benefits:
You can check for more than the presence of certain arguments.
You don't have to remember what order the arguments are in.
Wouldn't your version mean that if I decided to make asset a record in future I'd have to change all the code that called add-asset?
No. A record behaves as a map - it implements IPersistentMap. You'd have to change make-asset, obviously.
... whereas my approach the details of what an asset is is hidden?
In what sense are the contents of an asset hidden? An asset is a map required to have particular keys, and likely to have several other particular keys. Whether the asset is 'really' a record doesn't matter.
A core principle of Clojure (and any other Lisp dialect) is to create small composable functions.
It is not a problem if an asset is created outside of a risk assessment as long as the asset is not exposed to code that is expecting a fully formed asset before it has been added to a risk assessment.
So I would suggest the following (untested):
(defn add-asset-ra
[{:keys [current-ia-ref] :as risk-assessment} asset]
(let [ia-ref (if current-ia-ref
(inc current-ia-ref)
1)]
(-> risk-assessment
(assoc :current-ia-ref ia-ref)
(update-in [:assets] #(conj % (assoc asset :ia-ref ia-ref))))))
(defn make-asset
[name description owner categories
& {:keys [author notes confidentiality integrity availability]
:or {author "" notes "" confidentiality 3 integrity 3 availability 3}}]
{:name name
:desc description
:owner owner
:categories categories
:author author
:notes notes
:confidentiality confidentiality
:integrity integrity
:availability availability
:vulns []})
You may also find the Schema library useful to validate the shape of function arguments.