Destructruing and comparing data in efficient way

Destructruing and comparing data in efficient way - clojure

I'm new to Clojure trying my hands on with different "destructing" in clojure.
So what am I trying to achieve here is, I have two data-set like in below code snippet :- Major & minor,
(def result {"Major" { [200 {1 5,2 6, 3 4}] [ 201 {1 5,2 10,4 10,6 10}]}
"Minor" { [ 200 {1 5,2 6,3 4,4 10}] [ 201 {1 5,2 10,3 10}]}})
I want to take each of minor's data-set entry and compare it with its corresponding major data-set entry, if the value of the major data-set entry is sub-set of the minor one, then delete that entry from both the data-set (i.e major and minor).Else assoc that entry in some other var (i.e major-only & minor-only). And vice versa.
For example:-
{"Major" { [200 {1 5,2 10, 3 10}] [201 {1 5,2 10,4 10,6 10}] [204 {1 4,2 5,3 8,4 9}]}
"Minor" { [200 {1 5,2 10,3 10,4 10}] [203 {1 5,2 10,3 10}] [204 {1 4,2 5,3 8}]}})
major-only will be:- {201 value} (because it doesn't exists in minor), {204 value} (since the major's value is not subset of minor's value for key 204)
minor-only will be:-{203 value} (Since it does not exists in major),{204 value} entry(because the subset condition failed)
I tried to perform reduce with update-in while destructuring and comparing the data, but couldn't get the efficient way to get the expected result. Can anyone assist me here?
Also, i want to return the result of the function as below:-
{:major-only major-only
:minor-only minor-only}, how can I return this type of value?

I'm not sure all of your rules are completely clear to me, but here's a stab at an implementation of the sort of function you describe.
First, I'd define a map-subset? function:
(defn map-subset? [m1 m2]
(and (<= (count m1) (count m2))
(every?
(fn [[k v]]
(and (contains? m2 k)
(= (m2 k) v)))
m1)))
That might not be exactly what you mean by "subset", so feel free to adapt it accordingly.
Here's your data. I removed the square brackets around pairs in the map, since your question uses invalid EDN, and I assume this is what you meant:
(def data {"Major" {200 {1 5, 2 10, 3 10}
201 {1 5, 2 10, 4 10, 6 10}
204 {1 4, 2 5, 3 8, 4 9}}
"Minor" {200 {1 5, 2 10, 3 10, 4 10}
203 {1 5, 2 10, 3 10}
204 {1 4, 2 5, 3 8}}})
The core function, then, is relatively simple:
(let [{:strs [Major Minor]} data]
{:major-only (into {}
(remove (fn [[k v]]
(map-subset? v (Minor k))))
Major)
:minor-only (into {}
(remove (fn [[k v]]
(map-subset? v (Major k))))
Minor)})
This rebuilds each map, removing entries whose values are subsets of the corresponding value in the other map:
{:major-only {201 {1 5, 2 10, 4 10, 6 10}
204 {1 4, 2 5, 3 8, 4 9}}
:minor-only {200 {1 5, 2 10, 3 10, 4 10}
203 {1 5, 2 10, 3 10}}}
This produces slightly different results than in your example, which is why I'm not entirely sure if I understand your requirements, since from my interpretation of map subsets, Major's 200 is a subset of Minor's 200, and Minor's 204 is a subset of Major's 204.

Related

Turning this list into a vector of sets

Does anyone have an idea how to convert this list of vectors into a vector of sets?
([#{2}] [#{1 2 3 4}] [#{5}] [#{3}])
result should be
[#{2} #{1 2 3 4} #{5} #{3}]

Multiple ways to accomplish this. It depends on your needs what approach you use:
(def data '([#{2}] [#{1 2 3 4}] [#{5}] [#{3}]))
;; If you have a list of single element lists:
(mapv first data) ;; => [#{2} #{1 2 3 4} #{5} #{3}]
;; If you have a list of multiple element lists:
(vec (apply concat data)) ;; => [#{2} #{1 2 3 4} #{5} #{3}]
;; If you also want to handle multiple levels of nesting:
(vec (flatten data)) ;; => [#{2} #{1 2 3 4} #{5} #{3}]
;; If you ..;
(transduce (map first) conj data) ;; => [#{2} #{1 2 3 4} #{5} #{3}]

A way I haven't seen suggested is using into and cat.
(def data '([#{2}] [#{1 2 3 4}] [#{5}] [#{3}]))
(into [] cat data) => [#{2} #{1 4 3 2} #{5} #{3}]

Note that the sets are just elements of the vectors.
So, one way is to iterate (map) over the list of vectors and pick the first element of each vector (i.e. the set). This will build a list of these sets that you can then convert to a vector:
user=> (vec (map first '([#{2}] [#{1 4 3 2}] [#{5}] [#{3}])))
[#{2} #{1 4 3 2} #{5} #{3}]

how to process sorted data in clojure effectively?

I have a sequence of sorted data, and want to set the neighbor flag. e.g for the following data, for any element, if any neighbor has flag as 1, then set the
any-neighbor-flagged as 1 for that element. We could define neighbor as whether the diff of the seq is <=2, if the diff<=2, then they are neighbor.
There could be million of data point.
(def a '({:seq 1 :flag 1} {:seq 2 :flag 0} {:seq 5 :flag 0} {:seq 8 :flag 0} {:seq 10 :flag 1} {:seq 12 :flag 1}))
the expected result is:
({:seq 1 :any-neighbor-flagged 0} {:seq 2 :any-neighbor-flagged 1} {:seq 5 :any-neighbor-flagged 0} {:seq 8 :any-neighbor-flagged 1}
{:seq 10 :any-neighbor-flagged 1} {:seq 12 :any-neighbor-flagged 1})

With partition, we can look at a collection with neighboring context.
user=> (partition 3 1 (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
Given an input in that form, we can use reduce to accumulate a result based on neighbor comparisons.
user=> (pprint/pprint (reduce (fn [acc [i j k]]
(conj acc
(assoc j :any-neighbor-flagged
(if (or (= (:flag i) 1)
(= (:flag k) 1))
1 0))))
[]
(partition 3 1 (concat [nil] a [nil]))))
[{:any-neighbor-flagged 0, :seq 1, :flag 1}
{:any-neighbor-flagged 1, :seq 2, :flag 0}
{:any-neighbor-flagged 0, :seq 5, :flag 0}
{:any-neighbor-flagged 1, :seq 8, :flag 0}
{:any-neighbor-flagged 1, :seq 10, :flag 1}
{:any-neighbor-flagged 1, :seq 12, :flag 1}]

Basic idea is to map over 3 sequences - original one, shifted by 1 to the left and shifted by one to the right:
(defn set-flags [coll]
(map
(fn [curr {nf :flag} {pf :flag}]
(-> curr
(dissoc :flag)
(assoc :any-neighbor-flagged (if (or (= pf 1) (= nf 1)) 1 0))))
coll
(concat [{}] (drop-last coll))
(concat (rest coll) [{}])))
(set-flags a) ; => ({:any-neighbor-flagged 0, :seq 1} {:any-neighbor-flagged 1, :seq 2} {:any-neighbor-flagged 0, :seq 5} {:any-neighbor-flagged 1, :seq 8} {:any-neighbor-flagged 1, :seq 10} {:any-neighbor-flagged 1, :seq 12})
Illustration (for simplicity, only value of :flag is displayed):
(1 0 0 0 [1] 1) ; original seq
---------------
(1 0 0 [0] 1) ; shifted to right
(0 0 0 1 [1]) ; shifted to left
Now in map function we also have left neighbor and right neighbor for each element of input (possibly, empty maps). Based on that it's easy to set correct value for :any-neighbor-flagged.

Adding sets of numbers up to 16

I have some sets of numbers:
(#{7 1} #{3 5} #{6 3 2 5}
#{0 7 1 8} #{0 4 8} #{7 1 3 5}
#{6 2} #{0 3 5 8} #{4 3 5}
#{4 6 2} #{0 6 2 8} #{4} #{0 8}
#{7 1 6 2} #{7 1 4})
I wish to make each set into a four number vector, such that the sum of all the vectors add up to 16 and they can only come from the set of numbers:
#{7 1} => [1 1 7 7]
#{4 3 5} => [3 4 4 5]
#{4} => [4 4 4 4]
#{0 8} => [0 0 8 8]
Lastly, the vector has to contain all the numbers in the set. It'll be great to solve this for abitrary vector lengths :)
How would the clojure code be written.

With small sets and the originally stated output length of 4
This is easily handled with naive search
(defn bag-sum [s n]
(for [a s, b s, c s, d s
:let [v [a b c d]]
:when (= n (apply + v))
:when (= (set v) s)]
v))
(take 1 (bag-sum #{7 1} 16)) ;=> ([7 7 1 1])
(take 1 (bag-sum #{3 5} 16)) ;=> ([3 3 5 5])
(take 1 (bag-sum #{4 3 5} 16)) ;=> ([4 4 3 5])
Assuming 16 is fixed and all numbers are non-negative
The search space even without the set constraint is tiny.
(require '[clojure.math.combinatorics :refer [partition]])
(count (partitions (repeat 16 1))) ;=> 231
So, again a naive solution is very practical. We'll produce solutions of all lengths, which can be further filtered as desired. If there is a zero in the input set, it can pad any solution.
(defn bag-sum16 [s]
(for [p (partitions (repeat 16 1))
:let [v (mapv (partial apply +) p)]
:when (= (set v) s)]
v))
First example has 2 solutions - length 4 and length 10.
(bag-sum16 #{7 1}) ;=> ([7 7 1 1] [7 1 1 1 1 1 1 1 1 1])
(bag-sum16 #{3 5}) ;=> ([5 5 3 3])
(bag-sum16 #{3 4 5}) ;=> ([5 4 4 3])
Using core.logic finite domains to prune the search space with arbitrary but specified domain set s, output length m, and sum n
This is still fairly naive but prunes the search tree when the target sum is exceeded. I am a novice at core.logic, so this is more an opportunity to practice than an attempt at best representation of the problem. This performs worse than the naive solutions above on small spaces, but enables calculation in some medium size cases.
(defn bag-sum-logic [s m n]
(let [m* (- m (count s))
n* (- n (apply + s))
nums (vec (repeatedly m* lvar))
sums (concat [0] (repeatedly (dec m*) lvar) [n*])
dom (apply fd/domain (sort s))
rng (fd/interval n*)
sol (run 1 [q]
(== q nums)
(everyg #(fd/in % dom) nums)
(everyg #(fd/in % rng) sums)
(everyg #(apply fd/+ %)
(map cons nums (partition 2 1 sums))))]
(when (seq sol) (sort (concat s (first sol))))))
(bag-sum-logic #{7 1} 4 16) ;=> (1 1 7 7)
(bag-sum-logic #{7 1} 10 16) ;=> (1 1 1 1 1 1 1 1 1 7)
(bag-sum-logic #{3 5} 4 16) ;=> (3 3 5 5)
(bag-sum-logic #{3 4 5} 4 16) ;=> (3 4 4 5)
(time (bag-sum-logic #{3 4 5} 30 100))
;=> "Elapsed time: 18.739627 msecs"
;=> (3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 5 5 5 5)
Better algorithms for the general case?
This problem is a linear Diophantine equation, which can be solved with the Extended Euclidean Algorithm via matrix unimodular row reduction, i.e. carry out the Euclidean algorithm in one column while bringing the entire basis row along for the ride.
For example, in the case of #{3 5} and sum 16, you want to solve the equation
3x + 5y = 16
subject to the additional constraints that x > 0, y > 0 and x + y = 4 (your example).
The matrix and reduction steps
[[3 1 0] -> [[3 1 0] -> [[1 2 -1] -> [[1 2 -1]
[5 0 1]] [2 -1 1]] [2 -1 1]] [0 -5 3]]
So the GCD of 3 and 5 is 1, which divides into 16. Therefore there are (infinitely many) solutions before the constraints
x = 16 * 2 - 5k
y = 16 * -1 + 3k
Since we need x + y = 4, 4 = 16 - 2k and therefore k = 6, so
x = 2
y = 2
And we need 2 copies of 3 and 2 copies of 5.
This generalizes to more than 2 variables in the same manner. But whereas for 2 variables the length of the solution fully constrains the single free variable as shown above, more than 3 variables can be underspecified.
Solving linear Diophantine equations can be done in polynomial time. However, once you add the bounds (0, m), finding a solution becomes NP-complete, though a quick perusal of research results suggest there are fairly tractable approaches.

Working on the assumptions that you only want one solution per set and you want the solution ordered ascending as per your example this is what I came up with. There aren't many combinations of sets of 1-4 numbers so the way I initially decomposed the problem was to look at what the pattern of possible solutions might look like.
(def x #{3 5})
(def g 16)
(def y {1 [[0 0 0 0]]
2 [[0 0 0 1][0 0 1 1][0 1 1 1]]
3 [[0 0 1 2][0 1 1 2][0 1 2 2]]
4 [[0 1 2 3]]})
This key of this map indicates the size of the set x that is being evaluated. The values are the possible permutations of indices to the set once it is sorted into a vector. Now we can choose the permutations based on the size of the set and calculate the values of each permutation, stopping as soon as we reach the goal:
(filter #(= g (apply + %))
(for [p (y (count x))]
(mapv #((into [] (sort x)) %) p)))
The values of each key of the map above the permutations form a pattern: the first index is always 0 and the last always is the set size - 1 and all values are either the same as or one above the value to the left. Therefore, the above map can be generalised to:
(defn y2 [m s]
(map (fn [c] (reduce #(conj %1 (+ %2 (peek %1))) [0] c))
(clojure.math.combinatorics/permutations
(mapv #(if (>= % (dec s)) 0 1) (range (dec m))))))
(def y (partial y2 4))
The filter will now work for any number of set items up to s. As the input set is sorted, the search could be optimised to find the right (or no) solution by doing a binary search over the permutations of possible solutions for log2n search time.

How do I append a set to a vector?

Let's say I have [{1 2 3 4}]. How do I append {5 6 7 8} so that the vector will say [{1 2 3 4} {5 6 7 8}]

The same way you append any other value to a vector: conj
(let [v [#{1 2 3 4}]]
(conj v #{5 6 7 8}))
;; gives [#{1 2 3 4} #{5 6 7 8}]
(Incidentally, note the #{} syntax. {1 2 3 4} is a map containing the pairs (1 2) and (3 4), while #{1 2 3 4} is a set of the numbers 1, 2, 3 and 4.)

(conj [#{1 2 3 4}] #{5 6 7 8})
Note that in your example you have not sets, but maps from long to long.
Also, note that this is not appending to the vector (vectors are immutable), but rather it is creating a new vector with the new value appended.

clojure: permutations of subsets?

I'm new to clojure, looking for a function to generate permutations of subsets:
=> (find-subsets 1 #{1 2 3 4})
(#{1} #{2} #{3} #{4})
=> (find-subsets 2 #{1 2 3 4})
(#{1 2} #{1 3} #{1 4} #{2 3} #{2 4} #{3 4})
=> (find-subsets 3 #{1 2 3 4})
(#{1 2 3} #{1 3 4} #{2 3 4})
Does such a thing exist? If not, is there a nice, clean, idiomatic way to code the function?

Take a look at combinatorics. It does what you need:
; all the unique ways of taking n different elements from items
(clojure.math.combinatorics/combinations [1 2 3] 2)
;;=> ((1 2) (1 3) (2 3))
If it complains because you use a set instead of a vector, just convert to vector with vec before calling combinations.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Destructruing and comparing data in efficient way - clojure

Related

Turning this list into a vector of sets

how to process sorted data in clojure effectively?

Adding sets of numbers up to 16

How do I append a set to a vector?

clojure: permutations of subsets?

Categories

Resources