how to process sorted data in clojure effectively? - clojure

I have a sequence of sorted data, and want to set the neighbor flag. e.g for the following data, for any element, if any neighbor has flag as 1, then set the
any-neighbor-flagged as 1 for that element. We could define neighbor as whether the diff of the seq is <=2, if the diff<=2, then they are neighbor.
There could be million of data point.
(def a '({:seq 1 :flag 1} {:seq 2 :flag 0} {:seq 5 :flag 0} {:seq 8 :flag 0} {:seq 10 :flag 1} {:seq 12 :flag 1}))
the expected result is:
({:seq 1 :any-neighbor-flagged 0} {:seq 2 :any-neighbor-flagged 1} {:seq 5 :any-neighbor-flagged 0} {:seq 8 :any-neighbor-flagged 1}
{:seq 10 :any-neighbor-flagged 1} {:seq 12 :any-neighbor-flagged 1})

With partition, we can look at a collection with neighboring context.
user=> (partition 3 1 (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
Given an input in that form, we can use reduce to accumulate a result based on neighbor comparisons.
user=> (pprint/pprint (reduce (fn [acc [i j k]]
(conj acc
(assoc j :any-neighbor-flagged
(if (or (= (:flag i) 1)
(= (:flag k) 1))
1 0))))
[]
(partition 3 1 (concat [nil] a [nil]))))
[{:any-neighbor-flagged 0, :seq 1, :flag 1}
{:any-neighbor-flagged 1, :seq 2, :flag 0}
{:any-neighbor-flagged 0, :seq 5, :flag 0}
{:any-neighbor-flagged 1, :seq 8, :flag 0}
{:any-neighbor-flagged 1, :seq 10, :flag 1}
{:any-neighbor-flagged 1, :seq 12, :flag 1}]

Basic idea is to map over 3 sequences - original one, shifted by 1 to the left and shifted by one to the right:
(defn set-flags [coll]
(map
(fn [curr {nf :flag} {pf :flag}]
(-> curr
(dissoc :flag)
(assoc :any-neighbor-flagged (if (or (= pf 1) (= nf 1)) 1 0))))
coll
(concat [{}] (drop-last coll))
(concat (rest coll) [{}])))
(set-flags a) ; => ({:any-neighbor-flagged 0, :seq 1} {:any-neighbor-flagged 1, :seq 2} {:any-neighbor-flagged 0, :seq 5} {:any-neighbor-flagged 1, :seq 8} {:any-neighbor-flagged 1, :seq 10} {:any-neighbor-flagged 1, :seq 12})
Illustration (for simplicity, only value of :flag is displayed):
(1 0 0 0 [1] 1) ; original seq
---------------
(1 0 0 [0] 1) ; shifted to right
(0 0 0 1 [1]) ; shifted to left
Now in map function we also have left neighbor and right neighbor for each element of input (possibly, empty maps). Based on that it's easy to set correct value for :any-neighbor-flagged.

Related

Destructruing and comparing data in efficient way

I'm new to Clojure trying my hands on with different "destructing" in clojure.
So what am I trying to achieve here is, I have two data-set like in below code snippet :- Major & minor,
(def result {"Major" { [200 {1 5,2 6, 3 4}] [ 201 {1 5,2 10,4 10,6 10}]}
"Minor" { [ 200 {1 5,2 6,3 4,4 10}] [ 201 {1 5,2 10,3 10}]}})
I want to take each of minor's data-set entry and compare it with its corresponding major data-set entry, if the value of the major data-set entry is sub-set of the minor one, then delete that entry from both the data-set (i.e major and minor).Else assoc that entry in some other var (i.e major-only & minor-only). And vice versa.
For example:-
{"Major" { [200 {1 5,2 10, 3 10}] [201 {1 5,2 10,4 10,6 10}] [204 {1 4,2 5,3 8,4 9}]}
"Minor" { [200 {1 5,2 10,3 10,4 10}] [203 {1 5,2 10,3 10}] [204 {1 4,2 5,3 8}]}})
major-only will be:- {201 value} (because it doesn't exists in minor), {204 value} (since the major's value is not subset of minor's value for key 204)
minor-only will be:-{203 value} (Since it does not exists in major),{204 value} entry(because the subset condition failed)
I tried to perform reduce with update-in while destructuring and comparing the data, but couldn't get the efficient way to get the expected result. Can anyone assist me here?
Also, i want to return the result of the function as below:-
{:major-only major-only
:minor-only minor-only}, how can I return this type of value?
I'm not sure all of your rules are completely clear to me, but here's a stab at an implementation of the sort of function you describe.
First, I'd define a map-subset? function:
(defn map-subset? [m1 m2]
(and (<= (count m1) (count m2))
(every?
(fn [[k v]]
(and (contains? m2 k)
(= (m2 k) v)))
m1)))
That might not be exactly what you mean by "subset", so feel free to adapt it accordingly.
Here's your data. I removed the square brackets around pairs in the map, since your question uses invalid EDN, and I assume this is what you meant:
(def data {"Major" {200 {1 5, 2 10, 3 10}
201 {1 5, 2 10, 4 10, 6 10}
204 {1 4, 2 5, 3 8, 4 9}}
"Minor" {200 {1 5, 2 10, 3 10, 4 10}
203 {1 5, 2 10, 3 10}
204 {1 4, 2 5, 3 8}}})
The core function, then, is relatively simple:
(let [{:strs [Major Minor]} data]
{:major-only (into {}
(remove (fn [[k v]]
(map-subset? v (Minor k))))
Major)
:minor-only (into {}
(remove (fn [[k v]]
(map-subset? v (Major k))))
Minor)})
This rebuilds each map, removing entries whose values are subsets of the corresponding value in the other map:
{:major-only {201 {1 5, 2 10, 4 10, 6 10}
204 {1 4, 2 5, 3 8, 4 9}}
:minor-only {200 {1 5, 2 10, 3 10, 4 10}
203 {1 5, 2 10, 3 10}}}
This produces slightly different results than in your example, which is why I'm not entirely sure if I understand your requirements, since from my interpretation of map subsets, Major's 200 is a subset of Minor's 200, and Minor's 204 is a subset of Major's 204.

How to operate on a sequence of lvars

Say I want to get all combinations of bill/coins denominations that amount to a given value, given also a set of available denominations.
So for instance, for (change 14 #{1 2 5 10}) I'd expect
(
{10 1, 5 0, 2 2, 1 0}
{10 1, 5 0, 2 1, 1 2}
{10 0, 5 2, 2 2, 1 0}
{10 0, 5 2, 2 1, 1 2}
;; ...
)
My attempt was
(defn change [amount denominations]
(let [dens (sort > denominations)
vars (repeatedly (count dens) lvar)]
(run* [q]
(== q (zipmap dens vars))
(everyg #(fd/in % (fd/interval 0 amount)) vars)
(== amount (apply + (map * dens vars))))))
But naturally the last line doesn't work. I haven't found a way to do a sort of reduce over the vars sequence, or some other way to set goals that are not valid for each lvar individually, but for the whole (while also doing some operation with external values, amount and denominations in this example).
I haven't found a [...] way to set goals that are not valid for each lvar individually, but for the whole (while also doing some operation with external values, amount and denominations in this example).
One way to do this is to define a recursive relation function that takes the logic vars, the denominations, and the desired sum, and uses conso to set goals for each pair of vars and dens items:
(defn productsumo [vars dens sum]
(fresh [vhead vtail dhead dtail product run-sum]
(conde
[(emptyo vars) (== sum 0)]
[(conso vhead vtail vars)
(conso dhead dtail dens)
(fd/* vhead dhead product)
(fd/+ product run-sum sum)
(productsumo vtail dtail run-sum)])))
Notice the fresh logic variables here for the heads and tails of vars and dens, a product to store the product of the head of each for each "pass", and a run-sum variable that will be used to constrain the total of all the products such they're equal to sum. The combination of conso and recursion allows us to set goals for the whole of vars and dens.
Then plug that in to your existing function:
(defn change [amount denoms]
(let [dens (sort > denoms)
vars (repeatedly (count dens) lvar)]
(run* [q]
(== q (zipmap dens vars))
(everyg #(fd/in % (fd/interval 0 amount)) vars)
(productsumo vars dens amount))))
And finally get the answers:
(change 14 #{1 2 5 10})
=>
({10 0, 5 0, 2 0, 1 14}
{10 1, 5 0, 2 0, 1 4}
{10 0, 5 1, 2 0, 1 9}
{10 0, 5 0, 2 1, 1 12}
{10 1, 5 0, 2 1, 1 2}
{10 1, 5 0, 2 2, 1 0}
{10 0, 5 0, 2 2, 1 10}
{10 0, 5 2, 2 0, 1 4}
{10 0, 5 1, 2 1, 1 7}
{10 0, 5 0, 2 3, 1 8}
{10 0, 5 0, 2 4, 1 6}
{10 0, 5 1, 2 2, 1 5}
{10 0, 5 0, 2 5, 1 4}
{10 0, 5 2, 2 1, 1 2}
{10 0, 5 0, 2 6, 1 2}
{10 0, 5 1, 2 3, 1 3}
{10 0, 5 0, 2 7, 1 0}
{10 0, 5 2, 2 2, 1 0}
{10 0, 5 1, 2 4, 1 1})
I suspect there may be a more succinct/elegant solution, but this works.

Clojure/dataset: group-by multiple columns hierarchically?

I would like to implement a function that can group-by for multiple columns hierarchically. I can illustrate my requirement by the following tentative implementation for two columns:
(defn group-by-two-columns-hierarchically
[col1 col2 table]
(let [data-by-col1 ($group-by col1 table)
data-further-by-col2 (into {} (for [[k v] data-by-col1] [k ($group-by col2 v)]))
]
data-further-by-col2
))
I'm seeking help how to generalize on arbitrary number of columns.
(I understand that Incanter supports group-by for multiple columns but it only provides a structure not hierarchy, a map of composite key of multiple columns to value of datasets.)
Thanks for your help!
Note: to make MichaƂ's solution work for incanter dataset, only a slight modification is needed, replacing "group-by" by "incanter.core/$group-by", illustrated by the following experiment:
(defn group-by*
"Similar to group-by, but takes a collection of functions and returns
a hierarchically grouped result."
[fs coll]
(if-let [f (first fs)]
(into {} (map (fn [[k vs]]
[k (group-by* (next fs) vs)])
(incanter.core/$group-by f coll)))
coll))
(def table (incanter.core/dataset ["x1" "x2" "x3"]
[[1 2 3]
[1 2 30]
[4 5 6]
[4 5 60]
[7 8 9]
]))
(group-by* [:x1 :x2] table)
=>
{{:x1 1} {{:x2 2}
| x1 | x2 | x3 |
|----+----+----|
| 1 | 2 | 3 |
| 1 | 2 | 30 |
},
{:x1 4} {{:x2 5}
| x1 | x2 | x3 |
|----+----+----|
| 4 | 5 | 6 |
| 4 | 5 | 60 |
},
{:x1 7} {{:x2 8}
| x1 | x2 | x3 |
|----+----+----|
| 7 | 8 | 9 |
}}
(defn group-by*
"Similar to group-by, but takes a collection of functions and returns
a hierarchically grouped result."
[fs coll]
(if-let [f (first fs)]
(into {} (map (fn [[k vs]]
[k (group-by* (next fs) vs)])
(group-by f coll)))
coll))
Example:
user> (group-by* [:foo :bar :quux]
[{:foo 1 :bar 1 :quux 1 :asdf 1}
{:foo 1 :bar 1 :quux 2 :asdf 2}
{:foo 1 :bar 2 :quux 1 :asdf 3}
{:foo 1 :bar 2 :quux 2 :asdf 4}
{:foo 2 :bar 1 :quux 1 :asdf 5}
{:foo 2 :bar 1 :quux 2 :asdf 6}
{:foo 2 :bar 2 :quux 1 :asdf 7}
{:foo 2 :bar 2 :quux 2 :asdf 8}
{:foo 1 :bar 1 :quux 1 :asdf 9}
{:foo 1 :bar 1 :quux 2 :asdf 10}
{:foo 1 :bar 2 :quux 1 :asdf 11}
{:foo 1 :bar 2 :quux 2 :asdf 12}
{:foo 2 :bar 1 :quux 1 :asdf 13}
{:foo 2 :bar 1 :quux 2 :asdf 14}
{:foo 2 :bar 2 :quux 1 :asdf 15}
{:foo 2 :bar 2 :quux 2 :asdf 16}])
{1 {1 {1 [{:asdf 1, :bar 1, :foo 1, :quux 1}
{:asdf 9, :bar 1, :foo 1, :quux 1}],
2 [{:asdf 2, :bar 1, :foo 1, :quux 2}
{:asdf 10, :bar 1, :foo 1, :quux 2}]},
2 {1 [{:asdf 3, :bar 2, :foo 1, :quux 1}
{:asdf 11, :bar 2, :foo 1, :quux 1}],
2 [{:asdf 4, :bar 2, :foo 1, :quux 2}
{:asdf 12, :bar 2, :foo 1, :quux 2}]}},
2 {1 {1 [{:asdf 5, :bar 1, :foo 2, :quux 1}
{:asdf 13, :bar 1, :foo 2, :quux 1}],
2 [{:asdf 6, :bar 1, :foo 2, :quux 2}
{:asdf 14, :bar 1, :foo 2, :quux 2}]},
2 {1 [{:asdf 7, :bar 2, :foo 2, :quux 1}
{:asdf 15, :bar 2, :foo 2, :quux 1}],
2 [{:asdf 8, :bar 2, :foo 2, :quux 2}
{:asdf 16, :bar 2, :foo 2, :quux 2}]}}}

Adding sets of numbers up to 16

I have some sets of numbers:
(#{7 1} #{3 5} #{6 3 2 5}
#{0 7 1 8} #{0 4 8} #{7 1 3 5}
#{6 2} #{0 3 5 8} #{4 3 5}
#{4 6 2} #{0 6 2 8} #{4} #{0 8}
#{7 1 6 2} #{7 1 4})
I wish to make each set into a four number vector, such that the sum of all the vectors add up to 16 and they can only come from the set of numbers:
#{7 1} => [1 1 7 7]
#{4 3 5} => [3 4 4 5]
#{4} => [4 4 4 4]
#{0 8} => [0 0 8 8]
Lastly, the vector has to contain all the numbers in the set. It'll be great to solve this for abitrary vector lengths :)
How would the clojure code be written.
With small sets and the originally stated output length of 4
This is easily handled with naive search
(defn bag-sum [s n]
(for [a s, b s, c s, d s
:let [v [a b c d]]
:when (= n (apply + v))
:when (= (set v) s)]
v))
(take 1 (bag-sum #{7 1} 16)) ;=> ([7 7 1 1])
(take 1 (bag-sum #{3 5} 16)) ;=> ([3 3 5 5])
(take 1 (bag-sum #{4 3 5} 16)) ;=> ([4 4 3 5])
Assuming 16 is fixed and all numbers are non-negative
The search space even without the set constraint is tiny.
(require '[clojure.math.combinatorics :refer [partition]])
(count (partitions (repeat 16 1))) ;=> 231
So, again a naive solution is very practical. We'll produce solutions of all lengths, which can be further filtered as desired. If there is a zero in the input set, it can pad any solution.
(defn bag-sum16 [s]
(for [p (partitions (repeat 16 1))
:let [v (mapv (partial apply +) p)]
:when (= (set v) s)]
v))
First example has 2 solutions - length 4 and length 10.
(bag-sum16 #{7 1}) ;=> ([7 7 1 1] [7 1 1 1 1 1 1 1 1 1])
(bag-sum16 #{3 5}) ;=> ([5 5 3 3])
(bag-sum16 #{3 4 5}) ;=> ([5 4 4 3])
Using core.logic finite domains to prune the search space with arbitrary but specified domain set s, output length m, and sum n
This is still fairly naive but prunes the search tree when the target sum is exceeded. I am a novice at core.logic, so this is more an opportunity to practice than an attempt at best representation of the problem. This performs worse than the naive solutions above on small spaces, but enables calculation in some medium size cases.
(defn bag-sum-logic [s m n]
(let [m* (- m (count s))
n* (- n (apply + s))
nums (vec (repeatedly m* lvar))
sums (concat [0] (repeatedly (dec m*) lvar) [n*])
dom (apply fd/domain (sort s))
rng (fd/interval n*)
sol (run 1 [q]
(== q nums)
(everyg #(fd/in % dom) nums)
(everyg #(fd/in % rng) sums)
(everyg #(apply fd/+ %)
(map cons nums (partition 2 1 sums))))]
(when (seq sol) (sort (concat s (first sol))))))
(bag-sum-logic #{7 1} 4 16) ;=> (1 1 7 7)
(bag-sum-logic #{7 1} 10 16) ;=> (1 1 1 1 1 1 1 1 1 7)
(bag-sum-logic #{3 5} 4 16) ;=> (3 3 5 5)
(bag-sum-logic #{3 4 5} 4 16) ;=> (3 4 4 5)
(time (bag-sum-logic #{3 4 5} 30 100))
;=> "Elapsed time: 18.739627 msecs"
;=> (3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 5 5 5 5)
Better algorithms for the general case?
This problem is a linear Diophantine equation, which can be solved with the Extended Euclidean Algorithm via matrix unimodular row reduction, i.e. carry out the Euclidean algorithm in one column while bringing the entire basis row along for the ride.
For example, in the case of #{3 5} and sum 16, you want to solve the equation
3x + 5y = 16
subject to the additional constraints that x > 0, y > 0 and x + y = 4 (your example).
The matrix and reduction steps
[[3 1 0] -> [[3 1 0] -> [[1 2 -1] -> [[1 2 -1]
[5 0 1]] [2 -1 1]] [2 -1 1]] [0 -5 3]]
So the GCD of 3 and 5 is 1, which divides into 16. Therefore there are (infinitely many) solutions before the constraints
x = 16 * 2 - 5k
y = 16 * -1 + 3k
Since we need x + y = 4, 4 = 16 - 2k and therefore k = 6, so
x = 2
y = 2
And we need 2 copies of 3 and 2 copies of 5.
This generalizes to more than 2 variables in the same manner. But whereas for 2 variables the length of the solution fully constrains the single free variable as shown above, more than 3 variables can be underspecified.
Solving linear Diophantine equations can be done in polynomial time. However, once you add the bounds (0, m), finding a solution becomes NP-complete, though a quick perusal of research results suggest there are fairly tractable approaches.
Working on the assumptions that you only want one solution per set and you want the solution ordered ascending as per your example this is what I came up with. There aren't many combinations of sets of 1-4 numbers so the way I initially decomposed the problem was to look at what the pattern of possible solutions might look like.
(def x #{3 5})
(def g 16)
(def y {1 [[0 0 0 0]]
2 [[0 0 0 1][0 0 1 1][0 1 1 1]]
3 [[0 0 1 2][0 1 1 2][0 1 2 2]]
4 [[0 1 2 3]]})
This key of this map indicates the size of the set x that is being evaluated. The values are the possible permutations of indices to the set once it is sorted into a vector. Now we can choose the permutations based on the size of the set and calculate the values of each permutation, stopping as soon as we reach the goal:
(filter #(= g (apply + %))
(for [p (y (count x))]
(mapv #((into [] (sort x)) %) p)))
The values of each key of the map above the permutations form a pattern: the first index is always 0 and the last always is the set size - 1 and all values are either the same as or one above the value to the left. Therefore, the above map can be generalised to:
(defn y2 [m s]
(map (fn [c] (reduce #(conj %1 (+ %2 (peek %1))) [0] c))
(clojure.math.combinatorics/permutations
(mapv #(if (>= % (dec s)) 0 1) (range (dec m))))))
(def y (partial y2 4))
The filter will now work for any number of set items up to s. As the input set is sorted, the search could be optimised to find the right (or no) solution by doing a binary search over the permutations of possible solutions for log2n search time.

clojure.core/for with Cypher ExecutionResult

(defn cypher
[query]
(let [result (-> *cypher* (.execute query))]
(for [row result
column (.entrySet row)]
{(keyword (.getKey column))
(Neo4jVertex. (.getValue column) *g*)})))
repl=> (cypher "start n=node:people('*:*') return n")
{:n #<Neo4jVertex v[1]>}
This query returns two results, yet I'm only able to ever see one using clojure.core/for. How should I be going about this?
The Neo4j docs have this example (which is what I'm trying to emulate):
for ( Map<String, Object> row : result )
{
for ( Entry<String, Object> column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
}
rows += "\n";
}
I think you need clojure.core/doseq (docs) instead.
user=> (doseq [row [1 2 3]]
#_=> [result [4 5 6]]
#_=> (println (str {:row row :result result}))))
{:row 1, :result 4}
{:row 1, :result 5}
{:row 1, :result 6}
{:row 2, :result 4}
{:row 2, :result 5}
{:row 2, :result 6}
{:row 3, :result 4}
{:row 3, :result 5}
{:row 3, :result 6}
So, adapted to your example, something like the following might work:
; ...
(doseq [row result]
[column (.entrySet row)]
(println (str {(keyword (.getKey column)) (Neo4jVertex. (.getValue column) *g*)}))))
; ...
Note that doseq returns nil; you'll have to call something with side effects like println in the body of the doseq form.
It looks like clojure.core/for does list comprehension, so something like the following actually returns a list:
user=> (for [row [1 2 3]
#_=> result [4 5 6]]
#_=> {:row row :result result})
({:row 1, :result 4} {:row 1, :result 5} {:row 1, :result 6} {:row 2, :result 4} {:row 2, :result 5} {:row 2, :result 6} {:row 3, :result 4} {:row 3, :result 5} {:row 3, :result 6})