Why does this datalog query aggregate? - clojure

From https://github.com/tonsky/datascript
(->
(d/q '[:find ?color (max ?amount ?x) (min ?amount ?x)
:in [[?color ?x]] ?amount]
[[:red 10] [:red 20] [:red 30] [:red 40] [:red 50]
[:blue 7] [:blue 8]]
4)
pr-str
js/console.log)
;;; ([:red [20 30 40 50] [10 20 30 40]] [:blue [7 8] [7 8]])
(->
(d/q '[:find ?color (max ?amount ?x) (min ?amount ?x)
:in [[?color ?x]] ?amount]
[[:red 10] [:red 20] [:red 30] [:red 40] [:red 50]
[:blue 7] [:blue 8]]
3)
pr-str
js/console.log)
;;; ([:red [30 40 50] [10 20 30]] [:blue [7 8] [7 8]])
(->
(d/q '[:find ?color (max ?amount ?x) (min ?amount ?x)
:in [[?color ?x]] ?amount]
[[:red 10] [:red 20] [:red 30] [:red 40] [:red 50]
[:blue 7] [:blue 8]]
2)
pr-str
js/console.log)
;;; ([:red [40 50] [10 20]] [:blue [7 8] [7 8]])
So, this isn't a question about what it is doing, this is a question of how (or at least why) it is doing it. max and min are functions that return the maximum or minimum of their following integers, respectively. How is ?amount getting factored into limiting the aggregation count? Why are these things aggregating anyway? How is the code being run such that it aggregates. I really don't see how this code is flowing to generate the results it does.

max and min are overloaded in datomic queries.
The unary (min ?x) and (max ?x) functions aggreate to return a single number.
The binary (min ?n ?x) and (max ?n ?x) functions aggreate to return a collection of items limited in length by ?n.

Related

Clojure - Applying a Function to a vector of vectors

I have a vector [[[1 2] [3 4]] [[5 6] [7 8]] [9 10] 11]. I want to apply a function to this vector but keep the data structure.
For example I want to add 1 to every number but keep the data structure to get the result being [[[2 3] [4 5]] [[6 7] [8 9]] [10 11] 12]. Is this possible?
I have tried
(map #(+ 1 %) (flatten [[[1 2] [3 4]] [[5 6] [7 8]] [9 10] 11]))
=> (2 3 4 5 6 7 8 9 10 11 12)
But you can see that the data structure is not the same.
Is there maybe a function that takes (2 3 4 5 6 7 8 9 10 11 12) to [[[2 3] [4 5]] [[6 7] [8 9]] [10 11] 12]
I thought maybe to use postwalk but I'm not sure if this is correct.
Any help would be much appreciated
You can use postwalk:
(require '[clojure.walk :as walk])
(let [t [[[1 2] [3 4]] [[5 6] [7 8]] [9 10] 11]]
(walk/postwalk (fn [x] (if (number? x) (inc x) x)) t))
also the classic recursive solution is not much more difficult:
(defn inc-rec [data]
(mapv #((if (vector? %) inc-rec inc) %) data))
#'user/inc-rec
user> (inc-rec [1 [2 3 [4 5] [6 7]] [[8 9] 10]])
;;=> [2 [3 4 [5 6] [7 8]] [[9 10] 11]]
Another way to solve your problem is via Specter. You do need another dependency then, but it can be a helpful library.
(ns your-ns.core
(:require [com.rpl.specter :as specter]))
(def data [[[1 2] [3 4]] [[5 6] [7 8]] [9 10] 11])
(specter/defprotocolpath TreeWalker) ;; define path walker
(specter/extend-protocolpath TreeWalker
;; stop walking on leafs (in this case long)
Object nil
;; when we are dealing with a vector, TreeWalk all elements
clojure.lang.PersistentVector [specter/ALL TreeWalker])
You can extend it to perform more complicated operations. For this use case normal Clojure is good enough.
(specter/transform [TreeWalker] inc data)
;; => [[[2 3] [4 5]] [[6 7] [8 9]] [10 11] 12]

Complexity of Clojure's distinct + randomly generated stream

What is the time complexity of an expression
(doall (take n (distinct stream)))
where stream is a lazily generated (possibly infinite) collection with duplicates?
I guess this partially depends on the amount or chance of duplicates in stream? What if stream is (repeatedly #(rand-int m))) where m >= n?
My estimation:
For every element in the resulting list there has to be at least one element realized from the stream. Multiple if the stream has duplicates. For every iteration there is a set lookup and/or insert, but since those are near constant time we get at least: O(n*~1) = O(n) and then some complexity for the duplicates. My intuition is that the complexity for the duplicates can be neglected too, but I'm not sure how to formalize this. For example, we cannot just say it is O(n*k*~1) = O(n) for some constant k since there is not an obvious maximum number k of duplicates we could encounter in the stream.
Let me demonstrate the problem with some data:
(defn stream [upper distinct-n]
(let [counter (volatile! 0)]
(doall (take distinct-n
(distinct
(repeatedly (fn []
(vswap! counter inc)
(rand-int upper))))))
#counter))
(defn sample [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by val)
reverse))
(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])
Note that for the last sample the number of iterations distinct can go up to 66, although the chance is small.
Also notice that for increasing n in (sample 10000 n n) the most likely number of realized elements from the stream seems to go up more than linearly.
This chart illustrates the number of realized elements from the input (most common occurance from 10000 samples) in (doall (take n (repeatedly #(rand-int m))) for various numbers of n and m.
For completeness, here is the code I used to generate the chart:
(require '[com.hypirion.clj-xchart :as c])
(defn most-common [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by #(- (val %)))
ffirst))
(defn series [m]
{(str "m = " m)
(let [x (range 1 (inc m))]
{:x x
:y (map #(most-common 10000 m %)
x)})})
(c/view
(c/xy-chart
(merge (series 10)
(series 25)
(series 50)
(series 100))
{:x-axis {:title "n"}
:y-axis {:title "realized"}}))
Your problem is known as the Coupon collectors problem and the expected number of elements is given by just summing up m/m + m/(m-1) ... until you have your n items:
(defn general-coupon-collector-expect
"n: Cardinality of resulting set (# of uniuque coupons to collect)
m: Cardinality of set to draw from (#coupons that exist)"
[n m]
{:pre [(<= n m)]}
(double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
(general-coupon-collector-expect x 100))
Worst case will be infinite. Best case will be just N. Average case is O(N log N). This ignores the complexity of checking if an element has already been drawn. In practice it is Log_32 N for clojure sets (which is used in distinct).
While I aggree with ClojureMostly answer, that a lookup in a lazy sequence is O(1) if you iterate over the list in order. I disagree with best and worst case complexity.
In general (doall (take n (distinct stream))) is not guaranteed to finish at all so worst case time complexity is obivously O(infinite). Even if the stream is generated randomly, it might be identicall to let's say (repeat 0)
Best case complexity would be either O(1) for n<=1 (you do not need to any check for beeing distinct on a list of length 0 or 1)
If you say n needs to be greater then 1 it will be O((n-1)(n-2)/2) (for a list that is allready distinct to check you need to iterate for each element over all the elements that come after this element. that will be (n-1) + (n-2)+...+(n-n) = (n-1)(n-2)/2. This is a slight deviation from what Carl Friedrich Gauß
, a german mathematician, dicovered while beeing in primary school)
Note that best and worstcase is not dependent on how the stream is generated. However this will be important if you are interested in average complexity
Average complexity:
Let's say you genereate the stream with (repeatedly #(rand-int m)), which means it is evenly distributed.
Average complexity will then be best case complexity O plus the expected amount of duplicates in the first n elements of the stream (that is n/m ) times the expected amount of additional lookups in stream to find another value, that has not been in the resulting list yet. This will be (i/m), wehre i is the index of the current element in the resulting list.
Because stream is a evenly distributed random sequence, i is expected to be evenly distributed as well, so it will on average equal n/2. there we go:
O((n-1)(n-2)/2 + ( n/m * n/2m))

Convert map of vectors to vectors of columns in Clojure

I have a collection (or list or sequence or vector) of maps like so:
{ :name "Bob", :data [32 11 180] }
{ :name "Joe", :data [ 4 8 30] }
{ :name "Sue", :data [10 9 40] }
I want to create new vectors containing the data in the vector "columns" associated with keys that describe the data, like so:
{ :ages [32 4 10], :shoe-sizes [11 8 9], :weights [180 30 40] }
Actually, a simple list of vectors might be adequate, i.e.:
[32 4 10] [11 8 9] [180 30 40]
If it's better/easier to make the original list into a vector, that's fine; whatever's simplest.
Given
(def records
[{:name "Bob" :data [32 11 180]}
{:name "Joe" :data [ 4 8 30]}
{:name "Sue" :data [10 9 40]}])
you could do next transformations to get the desired result:
(->> records
(map :data) ; extract :data vectors
; => ([32 11 180] [4 8 30] [10 9 40])
(apply map vector) ; transpose
; => ([32 4 10] [11 8 9] [180 30 40])
(zipmap [:ages :shoe-sizes :weights])) ; make map
; => {:weights [180 30 40], :shoe-sizes [11 8 9], :ages [32 4 10]}
Without comments it looks a little bit cleaner:
(->> records
(map :data)
(apply map vector)
(zipmap [:ages :shoe-sizes :weights]))
Without threading macro it is equivalent to more verbose:
(let [extracted (map :data records)
transposed (apply map vector extracted)
result (zipmap [:ages :shoe-sizes :weights] transposed)]
result)
You could use reduce like this:
(def data [{ :name "Bob", :data [32 11 180] }
{ :name "Joe", :data [ 4 8 30] }
{ :name "Sue", :data [10 9 40] }])
(reduce
(fn [acc {[age shoe-size weight] :data}]
(-> acc
(update-in [:ages] conj age)
(update-in [:shoe-sizes] conj shoe-size)
(update-in [:weights] conj weight)))
{}
data)
Returns something like this:
{:weights (40 30 180), :shoe-sizes (9 8 11), :ages (10 4 32)}
I think the most interesting part of this code is the use of nested destructuring to grab hold of the keys: {[age shoe-size weight] :data}

Clojure: map map

I would like to use the map function on map. But I can't get it to work.
A toy example:
(map map [+ - *] [1 2 3] [4 5 6] [7 8 9])
I expect a result like (12 15 18) but all I get is an error.
Thanks.
As an alternative to already existing answers, you could replace the outer map with a list comprehension, which is more readable than a nested map IMHO:
user=> (defn fun [ops & args]
#_=> (for [op ops]
#_=> (apply map op args)))
#'user/fun
user=> (fun [+ - *] [1 2 3] [4 5 6] [7 8 9])
((12 15 18) (-10 -11 -12) (28 80 162))
If you want to map each of the operator separately over the lists, then use
((fn [ops & args] (map #(apply map %1 args) ops)) [+ - *] [1 2 3] [4 5 6] [7 8 9])
or if you are willing to reorder arguments
(map #(map %1 [1 2 3] [4 5 6] [7 8 9]) [+ - *])
Both give the result of ((12 15 18) (-10 -11 -12) (28 80 162))
You can use juxt:
(apply map list (map (juxt + - *) [1 2 3] [4 5 6] [7 8 9]))
Which will result in: ((12 15 18) (-10 -11 -12) (28 80 162))

Clojure - pairs from nested lists

I'm trying to make traversing nested lists to collect pairs more idiomatic in Clojure
(def mylist '(
(2, 4, 6)
(8, 10, 12)))
(defn pairs [[a b c]]
(list (list a c)(list b c)))
(mapcat pairs mylist)
;((2 6) (4 6) (8 12) (10 12))
Can this be made more elegant?
Your code is good, but I would use vectors instead of lists
(defn pairs [[x1 x2 y]]
[[x1 y] [x2 y]])
(mapcat pairs mylist)
Just to add more solutions (not elegant or intuitive; do not use ;) ):
(mapcat
(juxt (juxt first last) (juxt second last))
[[2 4 6] [8 10 12]])
;; => ([2 6] [4 6] [8 12] [10 12])
Or this one:
(mapcat
#(for [x (butlast %) y [(last %)]] [x y])
[[2 4 6] [8 10 12]])
;; => ([2 6] [4 6] [8 12] [10 12])