Datomic entity-api is slow on large amount of entities? - clojure

I need to apply additional logic (like mapping, conditionals, aggregating) to entities I get from Datomic. I had hard time translating it to Datomic query (I'm not sure if it's even possible in my case), which is why I used datomic's raw index access instead, so the most work and logic is done in Clojure.
It worked fine until I got to ~500K entries and the whole approach is getting very slow.
The relevant code:
(defn e->entry
"Map e into entry"
[e]
{:id (:entry/uuid e)
;; each flat field increases mapping time (seems linearly)
:date (:entry/date e)
:summ (:entry/summ e)
;; although when using nested fields, mapping time rises significantly
:groups (map #(-> % :dimension/group :group/name)
(:entry/dimensions e))})
;; query code:
(->> (d/datoms db :aevt :entry/uuid)
(map #(->> %
:e
(d/entity db)
e->entry))))
;; TODO: other actions on mapped entries ...
It takes about 30 seconds to run query code just to map entities and the more fields I need in my query, the more it takes.
Is this an expected behavior? Is there a way I can speed things up or am I missing something and this is bad approach?

To fully answer this question would require more information, please feel free to ask on the forum or open a support ticket.

I ended up with following optimizations, in case someone will need it:
(defn eid->entry
"Mapping via :eavt index"
[db eid]
(->> (d/datoms db :eavt eid) ; access all datoms by eid once
(seq)
(reduce (fn [m dtm]
(let [attr-key (d/ident db (:a dtm))
v (:v dtm)]
(assoc m attr-key v))))))
;; new query code
(->> (d/datoms db :aevt :entry/uuid)
(pmap #(->> %
:e
(eid->entry db))))
I used pmap instead of map and resorted to :eavt index to get all attributes and values of entity instead of accessing fields directly with d/entity

Related

Clojure: delete item from a ref?

I created the ref "people" below, and want the function delete-person to delete an item from the data structure within a transaction.
(defrecord Person [name favorite-color])
(def people (ref ()))
(defn add-person [new-person]
(dosync
(alter people conj new-person)))
(add-person (Person. "Joe" "Red"))
(add-person (Person. "Sam" "Blue"))
;; how do I make this function work?
;; would #people (destructured) have to be the second argument to filter?
(defn delete-person [name-to-delete]
"Delete named person from ref"
(dosync
(alter people filter #(not= (:name %) name-to-delete))))
(delete-person "Joe")
IllegalArgumentException Don't know how to create ISeq from:
user$delete_person$fn__1407$fn__1408 clojure.lang.RT.seqFrom (RT.java:542)
The function below works because I filter on the destructured ref, but how do I do it in a transaction to mutate the data?
(filter #(not= (:name %) "Sam") #people)
=> (#user.Person{:name "Joe", :favorite-color "Red"})
As there error says, you're trying to iterate a function. This is coming about because when you write:
(alter people filter #(not= (:name %) name-to-delete))
The unwrapped people becomes the first argument to filter, not the last.
You'll need to use an full fn, or use partial:
(alter people
(fn [ps] (filter #(not= (:name %) name-to-delete) ps)))
Or
(alter people
(partial filter #(not= (:name %) name-to-delete)))
These make alter pass the unwrapped people as the last argument to filter, instead of implicitly as the first.
I'll note though:
As #cfrick brought up in the comments, using lazy sequences in a transaction may have the potential to cause problems. I can't offhand think of a scenario where it would, but it feels wrong. It could be argued that the realization of a lazy sequence is a side effect, and side effects shouldn't take place in a transaction, since transactions may run multiple times in the event of a conflict. Multiple realizations shouldn't cause a problem, but I can't say definitively that it's safe (honestly, I never use refs).
Make sure you actually need refs and transactions here. Transactions are for when you need to sequence multiple alterations to data, and need to be able to catch when the data involved has been changed part way through a transaction. If you just need a simple mutable container though, atom are much simpler.

Filter a list of maps from an Atom in Clojure

I'm developing a mini-social media API where the user is allowed to insert a new profile, connect two profiles together (like friends) and then receive recommendations based on the "friends of my friends" rule.
Right now I'm trying to create the API for Profile.
I have an atom that holds a list of maps, one for each profile.
(def profiles (atom ()))
(defn create [request]
(swap! profiles conj {:id (get-in request [:body "id"])
:name (get-in request [:body "name"])
:age (get-in request [:body "age"])
:recommendable (get-in request [:body "recommendable"])
:friends (list)
})
(created "")
)
I was trying to develop the find-by-id for the GET http verb for the API when I stumbled into a problem. How can I get the values from the maps within said list so I can apply functions to it?
For instance, here I was trying to use the filter function to return me only the maps that contained a given id. But I keep getting an error:
(defn find-by-id [id]
(filter #(= (:id %) id) profiles)
)
Dont know how to create ISeq from: clojure.lang.Atom
It seems to me that filter is not applicable to an Atom.
Same thing happens to remove:
(defn delete-by-id [id]
(swap! profiles (remove #(= (:id %) id) profiles))
)
When I try with #profiles I get an empty array as a result. And to make things worst when I tried the filter function using REPL it worked just fine.
Which leaves me wondering what am I missing here.
Could anyone please tell me what's going on?
Thanks in advance.
The first one fails because, as it says, atoms aren't a sequence, which filter is expecting.
You need to get the sequence out of the atom before you can filter it:
; I'm dereferencing the atom using # to get the list of profiles that it holds
(defn find-by-id [id]
(filter #(= (:id %) id) #profiles))
Note though, this isn't optimal. You're relying on the state of profiles that can change at seemingly random times (if you have asynchronous processes swap!ping it). It may complicate debugging since you can't get a good handle on the data before it's passed to filter. It also isn't good for the function to rely on profiles being an atom, since that's irrelevant to its function, and you may change your design later. It would be more future proof to make this function rely purely on its parameters and have no knowledge of the atom:
(defn find-by-id [id profiles]
(filter #(= (:id %) id) profiles))
; Then call it like this. I renamed your atom here
(find-by-id some-id #profile-atom)
Your second example fails because swap! accepts a function as its second argument. I think you meant to use reset!, which changes the value of the atom regardless of what it was before:
(defn delete-by-id [id]
(reset! profiles (remove #(= (:id %) id) #profiles)))
Although, this isn't optimal either. If you want to update an atom based on a previous state, use swap! instead and supply an updating function:
(defn delete-by-id [id]
(swap! profile-atom (fn [profiles] (remove #(= (:id %) id)) profiles)))
Or, slightly more succinctly:
(defn delete-by-id [id]
(swap! profile-atom (partial remove #(= (:id %) id))))
I'm partially applying remove to make a function. The old state of the atom is passed as the last argument to remove.

How to iterate over a result set and extract one particular value in clojure?

Below is my attempt to iterate over a result set and get its values
(sql/with-connection db
(sql/with-query-results rs ["select * from user where UserID=?" 10000]
(doseq [rec rs
s rec]
(println (val s))
)))
But how do you extract one particular value from it; i need only the user name field.
Can anyone please demonstarte how to do this?
The result set is a sequence of maps, so if you wanted to obtain one field (e.g. one called name) then:
(sql/with-connection db
(sql/with-query-results rs ["select * from user where UserID=?" 10000]
(doseq [rec rs]
(let [name (:name rec)]
(println "User name:" name)
(println "Full record (including name):" rec)))))
But as mentioned in the comments, if you only want name, then select name from would be the more efficient option. The code above is useful when you need the full row for something else.
The with-connection / with-query-results syntax is deprecated as of clojure.java.jdbc 3.0. Filtering results can be done much easier with the new query syntax and additional :row-fn and :result-set-fn parameters.
(query db ["select * from user"]
:row-fn :name
:result-set-fn #(doall (take 1000 (drop 10000 %))))
Be sure to make the result-set-fn realize all values, it shouldn't return a lazy sequence (hence the doall in this example).

How can I improve this Clojure function?

I just wrote my first Clojure function based on my very limited knowledge of the language. I would love some feedback in regards to performance and use of types. For example, I'm not sure
if I should be using lists or vectors.
(defn actor-ids-for-subject-id [subject-id]
(sql/with-connection (System/getenv "DATABASE_URL")
(sql/with-query-results results
["SELECT actor_id FROM entries WHERE subject_id = ?" subject-id]
(let [res (into [] results)]
(map (fn [row] (get row :actor_id)) res)))))
It passes the following test (given proper seed data):
(deftest test-actor-ids-for-subject-id
(is (= ["123" "321"] (actor-ids-for-subject-id "123"))))
If it makes a difference (and I imagine it does) my usage characteristics of the returned data will almost exclusively involve generating the union and intersection of another set returned by the same function.
it's slightly more concise to use 'vec' instead of 'into' when the initial vector is empty. it may express the intent more clearly, though that's more a matter of preference.
(vec (map :actor_id results))
the results is a clojure.lang.Cons, is lazy sequence, return by clojure.java.jdbc/resultset-seq. each record is a map:
(defn actor-ids-for-subject-id [subject-id]
(sql/with-connection (System/getenv "DATABASE_URL")
(sql/with-query-results results
["SELECT actor_id FROM entries WHERE subject_id = ?" subject-id]
(into [] (map :actor_id results)))))

How Should I Iterate a Sequence?

Below is my attempt to iterate a sequence of maps; the code fails due to the casting error: Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: clojure.lang.Cons cannot be cast to java.util.Map$Entry.
Can anyone explain/demonstrate how I should iterate the result-set? Thanks.
(with-connection db
(with-query-results rs ["select category from users group by category"]
(doall
(for [s [rs]]
(do (println (val s)))))))
You wrapped the rs into a vector. So s will be bound to the whole sequence, not the individual map entries. So when you call val it doesn't know what to do with a sequence. Hence the exception. This should work:
(with-connection db
(with-query-results rs ["select category from users group by category"]
(doall
(for [rec rs
s rec]
(do
(println (val s)))))))
However the ugly doall and do around the for should ring a bell, that something could be improved. And indeed for is used to construct another lazy sequence. This does not work well with side-effects as you intend in your example. You should use doseq in this case.
(with-connection db
(with-query-results rs ["select category from users group by category"]
(doseq [rec rs
s rec]
(println (val s)))))
The interface for the bindings of doseq is identical to that of for. However it executes things immediatelly, and thusly realises any side-effects immediatelly. If you put multiple expressions in the body of a for, you have to wrap it into a do. This is a reminder that the body should produce a value. Multiple expressions however indicate side-effects. doseq therefore wraps the body into a do for you. So you can easily have multiple expressions. For illustration:
(doall
(for [s seq-of-maps]
(do
(println (key s))
(println (val s)))))
(doseq [s seq-of-maps]
(println (key s))
(println (val s)))))
As a rule of thumb: you need side-effects? Look for things starting in do!
As a rule of thumb 2: if something looks ugly (see above comparison), this should ring a bell.
OK, so it sounds like you are trying to do a DB query from Clojure. You may have to supply more information about the "users" table for instance and what your query result set looks like.
At any rate, something like this may work
(def a (with-query-results rs ["select category from users group by category"]
(doall rs)))
(map #(:category %) a)