ClojureScript zipmap tricks me or what? - clojure

I use Clojurescript to develop webbrowser-games. (Actually a friend of mine teaches me, we started only a few weeks ago).
I wanted to generate a map in which keys are vectors and values are numbers. For e.g.: {[0 0] 0, [0 1] 1, [0 2] 2, ...}.
I used this formula:
(defn oxo [x y]
(zipmap (map vec (combi/cartesian-product (range 0 x) (range 0 y))) (range (* x y))))
(where combi/ refers to clojure.math.combinatorics).
When it generates the map, key-value pairs are ok, but they are in a random order, like:
{[0 1] 1, [6 8] 68, [6 9] 69, [5 7] 57, ...}
What went wrong after using zipmap and how can i fix it?

Clojure maps aren't guaranteed to have ordered/sorted keys. If you want to ensure the keys are sorted, use a sorted-map:
(into (sorted-map) (oxo 10 10))
=>
{[0 0] 0,
[0 1] 1,
[0 2] 2,
[0 3] 3,
[0 4] 4,
[0 5] 5,
...
If your map has fewer than 9 keys then insertion order is preserved because the underlying data structure is different depending on the number of keys:
clojure.lang.PersistentArrayMap for <9 keys
clojure.lang.PersistentHashMap otherwise.
array-map produces a clojure.lang.PersistentArrayMap and sorted-map produces a clojure.lang.PersistentTreeMap. Note that associng onto an array map may produce a hash map, but associng on to a sorted map still produces a sorted map.

zipmap produces a hash-map where order of the keys is not guaranteed.
If you want ordered keys you can use either sorted-map or array-map.

As far as my knowledge goes, you should not rely on Map/Hash/Dictionary for ordering in any languages.
If the order is important but you don't need O(1) lookup performance of the map, a vector of vector pairs is a good option for you.
(defn oxo [x y]
(mapv vector (map vec (combi/cartesian-product (range 0 x) (range 0 y))) (range (* x y))))
You will get something like this.
=> (oxo 10 10)
[[[0 0] 0] [[0 1] 1] [[0 2] 2] [[0 3] 3] [[0 4] 4] [[0 5] 5] ...]

Related

How do I iterate through two Sets in Clojure in order to return their Cartesian product?

So I take in two Sets and wanna iterate through them in order to return a new set containing the cross product of the two sets.
(defn cartesian
"computes Cartesian product of A and B"
[A B]
use "set", "for")
I'm very new to Clojure so I'm not sure why use "set, "for" is included at all.
But A and B will be Sets. Now I wanna iterate through each one and return the Cartesian product of them. Example:
(cartesian #{1 2} #{3 4 5}) => #{[1 3] [1 4] [1 5] [2 3] [2 4] [2 5]}
I'm not sure on the steps to get there though. I have looked through documentation etc but can't find what I'm looking for. Other answer to this deal with lists etc. But I have to use 2 Sets.
What I'm looking at atm is using doseq[e A] and inside that doseq[x B] then add each vector-pair [e x] to a new Set and then return it. This doesn't seem like a standard functional solution though. Am I on the right track? How do I add it to a new Set?
You can accomplish that using for:
(defn cartesian [A B]
(set (for [a A
b B]
[a b])))
(cartesian #{1 2} #{3 4 5})
;; => #{[2 3] [2 5] [1 4] [1 3] [1 5] [2 4]}
Use cartesian-product from clojure.math.combinatorics. To get exact result you want (set of vectors), use into with map transducer:
(into #{} (map vec) (clojure.math.combinatorics/cartesian-product #{1 2} #{3 4 5}))
=> #{[2 3] [2 5] [1 4] [1 3] [1 5] [2 4]}

Complexity of Clojure's distinct + randomly generated stream

What is the time complexity of an expression
(doall (take n (distinct stream)))
where stream is a lazily generated (possibly infinite) collection with duplicates?
I guess this partially depends on the amount or chance of duplicates in stream? What if stream is (repeatedly #(rand-int m))) where m >= n?
My estimation:
For every element in the resulting list there has to be at least one element realized from the stream. Multiple if the stream has duplicates. For every iteration there is a set lookup and/or insert, but since those are near constant time we get at least: O(n*~1) = O(n) and then some complexity for the duplicates. My intuition is that the complexity for the duplicates can be neglected too, but I'm not sure how to formalize this. For example, we cannot just say it is O(n*k*~1) = O(n) for some constant k since there is not an obvious maximum number k of duplicates we could encounter in the stream.
Let me demonstrate the problem with some data:
(defn stream [upper distinct-n]
(let [counter (volatile! 0)]
(doall (take distinct-n
(distinct
(repeatedly (fn []
(vswap! counter inc)
(rand-int upper))))))
#counter))
(defn sample [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by val)
reverse))
(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])
Note that for the last sample the number of iterations distinct can go up to 66, although the chance is small.
Also notice that for increasing n in (sample 10000 n n) the most likely number of realized elements from the stream seems to go up more than linearly.
This chart illustrates the number of realized elements from the input (most common occurance from 10000 samples) in (doall (take n (repeatedly #(rand-int m))) for various numbers of n and m.
For completeness, here is the code I used to generate the chart:
(require '[com.hypirion.clj-xchart :as c])
(defn most-common [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by #(- (val %)))
ffirst))
(defn series [m]
{(str "m = " m)
(let [x (range 1 (inc m))]
{:x x
:y (map #(most-common 10000 m %)
x)})})
(c/view
(c/xy-chart
(merge (series 10)
(series 25)
(series 50)
(series 100))
{:x-axis {:title "n"}
:y-axis {:title "realized"}}))
Your problem is known as the Coupon collectors problem and the expected number of elements is given by just summing up m/m + m/(m-1) ... until you have your n items:
(defn general-coupon-collector-expect
"n: Cardinality of resulting set (# of uniuque coupons to collect)
m: Cardinality of set to draw from (#coupons that exist)"
[n m]
{:pre [(<= n m)]}
(double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
(general-coupon-collector-expect x 100))
Worst case will be infinite. Best case will be just N. Average case is O(N log N). This ignores the complexity of checking if an element has already been drawn. In practice it is Log_32 N for clojure sets (which is used in distinct).
While I aggree with ClojureMostly answer, that a lookup in a lazy sequence is O(1) if you iterate over the list in order. I disagree with best and worst case complexity.
In general (doall (take n (distinct stream))) is not guaranteed to finish at all so worst case time complexity is obivously O(infinite). Even if the stream is generated randomly, it might be identicall to let's say (repeat 0)
Best case complexity would be either O(1) for n<=1 (you do not need to any check for beeing distinct on a list of length 0 or 1)
If you say n needs to be greater then 1 it will be O((n-1)(n-2)/2) (for a list that is allready distinct to check you need to iterate for each element over all the elements that come after this element. that will be (n-1) + (n-2)+...+(n-n) = (n-1)(n-2)/2. This is a slight deviation from what Carl Friedrich Gauß
, a german mathematician, dicovered while beeing in primary school)
Note that best and worstcase is not dependent on how the stream is generated. However this will be important if you are interested in average complexity
Average complexity:
Let's say you genereate the stream with (repeatedly #(rand-int m)), which means it is evenly distributed.
Average complexity will then be best case complexity O plus the expected amount of duplicates in the first n elements of the stream (that is n/m ) times the expected amount of additional lookups in stream to find another value, that has not been in the resulting list yet. This will be (i/m), wehre i is the index of the current element in the resulting list.
Because stream is a evenly distributed random sequence, i is expected to be evenly distributed as well, so it will on average equal n/2. there we go:
O((n-1)(n-2)/2 + ( n/m * n/2m))

Modifying nested data in clojure

How can I manipulate a nested data structure?
I have a list of this kind
[["first_string" {:one 1 :two 2}]
["second_string" {:three 3 :four 4}]
["third_string" {:five 5 :six 6}]
["fourth_string" {:seven 7 :eight 8}]]
And I need to change it to this form:
[["first_string" 1]
["second_string" 3]
["third_string" 5]
["fourth_string" 7]]
Essentially, I want only the first element of each of the inner vectors first key of the map
Try defining a function that operates on a single entry in the vector and then map over it:
(defn manipulate-nested
[entry]
[(first entry) (last (first (last entry)))])
(let [input [["first_string" {:one 1 :two 2}]
["second_string" {:three 3 :four 4}]
["third_string" {:five 5 :six 6}]
["fourth_string" {:seven 7 :eight 8}]]]
(into [] (map manipulate-nested input)))
;; [["first_string" 1]
;; ["second_string" 3]
;; ["third_string" 5]
;; ["fourth_string" 7]]
I need to change it to this form
NB: Keep in mind that strictly speaking you're not changing (mutating) the original vector, but describing a modification of it.
You can't get a reliable first key of a hash-map because hash-maps are an unsorted data structure and seqing them thus has no order guarantees. So there is no first or second.
There is no way of numerically ordering the keywords :one, :two, :three without parsing their names which I leave as a separate problem.
Here is your problem reposed with ordered structures in place of the hash-maps:
(def data [["first_string" [[:one 1] [:two 2]]]
["second_string" [[:three 3] [:four 4]]]
["third_string" [[:five 5] [:six 6]]]
["fourth_string" [[:seven 7] [:eight 8]]]]
One typical and idiomatic solution is to extract from each vector in data independently via map, using destructuring in the transformation functions binding vector to bind the desired nested elements and returning this extraction in a new vector:
(map (fn [[s [[_ n] _]]] [s n]) data)
The input data structure of your specific problem offers a way with less overhead by reusing the passed vector instead of constructing a new one in each step:
(map #(update % 1 (comp second first)) data)

Circularly shifting nested vectors

Given a nested vector A
[[1 2 3] [4 5 6] [7 8 9]]
my goal is to circularly shift rows and columns.
If I first consider a single row shift I'd expect
[[7 8 9] [1 2 3] [4 5 6]]
where the 3rd row maps to the first in this case.
This is implemented by the code
(defn circles [x i j]
(swap-rows x i j))
with inputs
(circles [[1 2 3] [4 5 6] [7 8 9]] 0 1)
However, I am unsure how to go further and shift columns. Ideally, I would like to add to the function circles and be able to either shift rows or columns. Although I'm not sure if it's easiest to just have two distinct functions for each shift choice.
(defn circles [xs i j]
(letfn [(shift [v n]
(let [n (- (count v) n)]
(vec (concat (subvec v n) (subvec v 0 n)))))]
(let [ys (map #(shift % i) xs)
j (- (count xs) j)]
(vec (concat (drop j ys) (take j ys))))))
Example:
(circles [[1 2 3] [4 5 6] [7 8 9]] 1 1)
;= [[9 7 8] [3 1 2] [6 4 5]]
Depending on how often you expect to perform this operation, the sizes of the input vectors and the shifts to be applied, using core.rrb-vector could make sense. clojure.core.rrb-vector/catvec is the relevant function (you could also use clojure.core.rrb-vector/subvec for slicing, but actually here it's fine to use the regular subvec from clojure.core, as catvec will perform its own conversion).
You can also use cycle:
(defn circle-drop [i coll]
(->> coll
cycle
(drop i)
(take (count coll))
vec))
(defn circles [coll i j]
(let [n (count coll)
i (- n i)
j (- n j)]
(->> coll
(map #(circle-drop i %))
(circle-drop j))))
(circles [[1 2 3] [4 5 6] [7 8 9]] 2 1)
;; => [[8 9 7] [2 3 1] [5 6 4]]
There's a function for that called rotate in core.matrix (as is often the case for general purpose array/matrix operations)
The second parameter to rotate lets you choose the dimension to rotate around (0 for rows, 1 for columns)
(use 'clojure.core.matrix)
(def A [[1 2 3] [4 5 6] [7 8 9]])
(rotate A 0 1)
=> [[4 5 6] [7 8 9] [1 2 3]]
(rotate A 1 1)
=> [[2 3 1] [5 6 4] [8 9 7]]

How do I use map with function with more than one parameter

I have the following method:
(defn area [x y] (* x y))
How do I iterate through a list with respect to the parameters number. Something like
(map area [2 5 6 6])
so it will make calculations like (area 2 5) and (area 6 6), maybe vector is not the proper type to use.
You can use partition as some have suggested here but you might want to consider arranging the data differently. For example you could use a vector of vectors:
[[2 5] [6 6]]
Then you can change your area function to:
(defn area [[x y]] (* x y))
Now you can call that with one of your pairs: (area [6 6]) and mapping over your vector is easy:
(map area [[2 5] [6 6]])
If for some reason you need area to take two parameters instead of a vector you can do something like this:
(map #(apply area %) [[2 5] [6 6]])
To me that's still simpler than using partition.
Try this:
(map #(apply area %) (partition 2 [2 5 6 6]))
map requires a separate sequence parameter for each parameter that the function expects:
(map [2 6] [5 6])