Group-by and data get - clojure

I think I am misunderstanding GET in clojure - I am trying to mould 1 data set from another, from the below A to B.
A
ID REGION MIN
10346 GLBL 106
10346 ASPAC 106
10346 NA 106
10346 LATAM 106
10346 EMEA 106
10347 GLBL 32
10347 ASPAC 32
10347 NA 32
10347 LATAM 32
10347 EMEA 32
10349 NA 10
10327 NA
10344 EMEA 8
10342 ASPAC 292
10342 EMEA 292
10348 ASPAC 15
10422 EMEA 37
10438 NA 0
B
ID EMEA NA ASPAC GLBL LATAM
10346 106 106 106 106 106
10347 32 32 32 32 32
10349 0 10 0 0 0
10327 0 0 0 0 0
10344 8 0 0 0 0
10342 292 0 292 0 0
10348 0 0 15 0 0
10422 37 0 0 0 0
10438 0 0 0 0 0
The group by is working but I am getting null values for all the regions, I though filtering on the region I could use get to obtain that value for MIN in that record and map it to the new region field - any advise on what I am doing wrong here? Or what I should be using instead of GET?
(defn- create-summary [data]
(->> data
(group-by :ID
vals
(map
(fn [recs]
(let [a (fn [priority](get :MIN (filter #(= priority (:REGION %)) recs)))]
{:ID (:ID (first recs))
:EMEA (a "EMEA")
:NA (a "NA")
:GLBL (a "GLBL")
:LATAM (a "LATAM")
:ASPAC (a "ASPAC")
})))
))

This:
(let [a (fn [priority](get :MIN (filter #(= priority (:REGION %)) recs)))]
Should be
(let [a (fn [priority](get (first (filter #(= priority (:REGION %)) recs)) :MIN))]

Related

How to group ages in specific range using clojure?

I have a list which has the age of persons like (12,23,34,33,34,45,56...) almost 200 numbers. I want to group them (10-20)(21-30)(31-40)....(91-100)age groups seperately.
How do I do it in clojure.
Thank you
Here's an implementation, the key functions are group-by and quot:
(defn group-by-tens [numbers]
(->> numbers (group-by #(quot % 10))
(sort-by first)
(map second)))
Example:
(group-by-tens [15 28 35 6 9 37 33 47 11 38 4 27 49 47 38 20 36 49 27 30])
=> ([6 9 4] [15 11] [28 27 20 27] [35 37 33 38 38 36 30] [47 49 47 49])
also, if your age values are sorted (like in the example from your question), you can just partition them (or otherwise sort and partition):
user> (partition-by #(quot % 10)
[1 2 3 4 10 12 16 23 27 29 33 34 45 59 71 72])
;;=> ((1 2 3 4) (10 12 16) (23 27 29) (33 34) (45) (59) (71 72))

Clojure Lazy Seq Results in Stack Overflow

I want to compute a lazy sequence of primes.
Here is the interface:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
So far, so good.
However, when I take 500 primes, this results in a stack overflow.
core.clj: 133 clojure.core/seq
core.clj: 2595 clojure.core/filter/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2626 clojure.core/take/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
Cons.java: 39 clojure.lang.Cons/next
LazySeq.java: 81 clojure.lang.LazySeq/next
RT.java: 598 clojure.lang.RT/next
core.clj: 64 clojure.core/next
core.clj: 2856 clojure.core/dorun
core.clj: 2871 clojure.core/doall
core.clj: 2910 clojure.core/partition/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2551 clojure.core/map/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 3973 clojure.core/interleave/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
I'm wondering what is the problem here and, more generally, when working with lazy sequences, how should I approach this class of error?
Here is the code.
(defn assoc-nth
"Returns a lazy seq of coll, replacing every nth element by val
Ex:
user=> (assoc-nth [3 4 5 6 7 8 9 10] 2 nil)
(3 nil 5 nil 7 nil 9 nil)
"
[coll n val]
(apply concat
(interleave
(map #(take (dec n) %) (partition n coll)) (repeat [val]))))
(defn sieve
"Returns a lazy seq of primes by Eratosthenes' method
Ex:
user=> (take 4 (sieve (iterate inc 2)))
(2 3 5 7)
user=> (take 10 (sieve (iterate inc 2)))
(2 3 5 7 11 13 17 19 23 29)
"
[s]
(lazy-seq
(if (seq s)
(cons (first s) (sieve
(drop-while nil? (assoc-nth (rest s) (first s) nil))))
[])))
(def primes
"Returns a lazy seq of primes
Ex:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
"
(concat [2] (sieve (filter odd? (iterate inc 3)))))
You are generating a lot of lazy sequences that all take up stack space.
Breaking it down:
user=> (iterate inc 3)
;; produces first lazy seq (don't run in repl!)
(3 4 5 6 7 8 9 10 11 12 13 ...)
user=> (filter odd? (iterate inc 3))
;; again, don't run in repl, but lazy seq #2
(3 5 7 9 11 13 ...)
I personally would have done (iterate #(+ 2 %) 3) to cut a sequence out, but that's a drop in the ocean compared to the overall problem.
Now we start into the sieve which starts creating a new lazy seq for our output
(lazy-seq
(cons 3 (sieve (drop-while nil? (assoc-nth '(5 7 9 ...) 3 nil)))))
Jumping into assoc-nth, we start creating more lazy sequences
user=> (partition 3 [5 7 9 11 13 15 17 19 21 23 25 ...])
;; lazy seq #4
((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...)
user=> (map #(take 2 %) '((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...))
;; lazy seq #5
((5 7) (11 13) (17 19) (23 25) ...)
user=> (interleave '((3 5) (9 11) (15 17) (21 23) ...) (repeat [nil]))
;; lazy seq #6 + #7 (repeat [nil]) is a lazy seq too
((5 7) [nil] (11 13) [nil] (17 19) [nil] (23 25) [nil] ...)
user=> (apply concat ...
;; lazy seq #8
(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...)
So already you have 8 lazy sequences, and we've only applied first sieve.
Back to sieve, and it does a drop-while which produces lazy sequence number 9
user=> (drop-while nil? '(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...))
;; lazy seq #9
(5 7 11 13 17 19 23 25 ...)
So for one iteration of sieve, we generated 9 sequences.
In the next iteration of sieve, you generate a new lazy-seq (not the original!), and the process begins again, generating another 7 sequences per loop.
The next problem is the efficiency of your assoc-nth function. A sequence of numbers and nils isn't an efficient way to mark multiples of a particular factor. You end up very quickly with a lot of very long sequences that can't be released, mostly filled with nils, and you need to read longer and longer sequences to decide if a candidate is a factor or not - this isn't efficient for lazy sequences which read chunks of 32 entries to be efficient, so when your factoring out 33 and above, you're pulling in multiple chunks of the sequence to work on.
Also, each time you're doing this in a completely new sequence.
Adding some debug to your assoc-nth, and running it on a small sample will illustrate this quickly.
user=> (sieve (take 50 (iterate #(+ 2 %) 3)))
assoc-nth n: 3 , coll: (5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101)
returning r: (5 7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
assoc-nth n: 5 , coll: (7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
returning r: (7 nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
assoc-nth n: 7 , coll: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
returning r: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil nil 79 nil 83 nil nil 89 nil)
;; ...
assoc-nth n: 17 , coll: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil)
returning r: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
assoc-nth n: 19 , coll: (nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
returning r: ()
(3 5 7 11 13 17 19)
This illustrates how you need more and more elements of the sequence to produce a list of primes, as I started with odd numbers 3 to 99, but only ended up with primes 3 to 19, but on the last iteration n=19, there weren't enough elements in my finite sequence to nil out further multiples.
Is there a solution?
I think you're looking for answers to "how do I make lazy sequences better at doing what I want to do?". Lazy sequences are going to be a trade off in this case. Your algorithm works, but generates too much stack. You're not using any recur, so you generate sequences on sequences. The first thing to look at would be how to make your methods more tail recursive, and lose some of the sequences. I can't offer solutions to that here, but I can link other solutions to the same problem and see if there are areas they do better than yours.
There are 2 decent (bound numbers) solutions at this implementation of Eratosthenes primes. One is using a range, and sieving it, the other using arrays of booleans (which is 40x quicker). The associated article with it (in Japenese but google translates well in Chrome) is a good read, and shows the times for naive approaches vs a very focused version using arrays directly and type hints to avoid further unboxing issues in the jvm.
There's another SO question that has a good implementation using transient values to improve efficiency. It has similar filtering methods to yours.
In each case, different sieving methods are used that avoid lazy sequences, however as there's an upper bound on the primes they generate, they can swap out generality for efficiency.
For unbound case, look at this gist for an infinite sequence of primes. In my tests it was 6x slower than the previous SO question, but still pretty good. The implementation however is very different.

Clojure: decide if argument is a prime

I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not.
Here is my code:
(defn is-prime [n]
(nil?
(some #(= (mod n %) 0)
(range 2 (java.lang.Math/sqrt n)))))
My problem is, that this function returns true when it is called with '4'.
(is-prime 4) => true
I wrote another function for debuggin purposes, it lists all the primes that are less than 250:
(defn primes [] (filter #(is-prime %) (range 1 250)))
I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct.
(primes)
=> (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241)
I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes.
Simplest fix is (range 2 (inc (sqrt n))).

Clojure Docs: Understanding This Example of the Reduce Function

Reading through the Clojure docs and I'm confused by one example of the reduce function. I understand what reduce does, but there's a lot going on in this example and I'm not sure how it's all working together.
(reduce
(fn [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
[2]
(take 1000 (iterate inc 3)))
=> [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997]
From what I understand, reduce takes a function, in this case an anonymous function. That function takes two arguments, a collection and a number. Then we have a conditional statement that checks if the number zero appears in the collection.
(map (partial mod number) primes) confuses me. Doesn't mod take two arguments and return the remainder of dividing the first by the second?
It appears that if this conditional returns true, it returns the collection of primes. If not, add the number to the collection of primes. Is that correct?
And the final line, it's a collection of 1,000 numbers starting at 3. Would someone be able to walk through this function?
You probably already know this but the first thing to note is the following property: A number is prime if it is indivisible by any prime number smaller than it.
So the anonymous function starts with a vector of previous primes, and then checks if number is divisible by any of the previous primes. If it is divisible by any of the previous primes, it will just pass on the vector of previous primes, otherwise it will add the current number to the vector of primes and then return the new vector.
(partial mod number) is equivalent to (fn [x] (mod number x)) in this case.
To step through a few cases:
;Give a name to the anonymous function
(defn prime-checker [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
;This is how reduce will call the anonymous function
(prime-checker [2] 3)
-> ((map (partial mod number) primes) = [1]
-> will return [2 3]
(prime-checker [2 3] 4)
-> ((map (partial mod number) primes) = [0 1]
-> some zero? finds a zero value here so the function will return [2 3]
(prime-checker [2 3] 5)
-> ((map (partial mod number) primes) = [1 2]
-> will return [2 3 5]
Hopefully you can see from this how reduce with this function returns a list of primes.

Why is this prime sieve implementation slower?

I was just experimenting a bit with (for me) a new programming language: clojure. And I wrote a quite naive 'sieve' implementation, which I then tried to optimise a bit.
Strangely enough though (for me at least), the new implementation wasn't faster, but much slower...
Can anybody provide some insight in why this is so much slower?
I'm also interested in other tips in how to improve this algorithm...
Best regards,
Arnaud Gouder
; naive sieve.
(defn sieve
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(recur max (filter #(or (= % n) (not (= (mod % n) 0))) candidates) (inc n)))))
; Instead of just passing the 'candidates' list, from which I sieve-out the non-primes,
; I also pass a 'primes' list, with the already found primes
; I hoped that this would increase the speed, because:
; - Instead of sieving-out multiples of 'all' numbers, I now only sieve-out the multiples of primes.
; - The filter predicate now becomes simpler.
; However, this code seems to be approx 20x as slow.
; Note: the primes in 'primes' end up reversed, but I don't care (much). Adding a 'reverse' call makes it even slower :-(
(defn sieve2
([max] (sieve2 max () (range 2 max)))
([max primes candidates]
(let [n (first candidates)]
(if (> (* n n) max)
(concat primes candidates)
(recur max (conj primes n) (filter #(not (= (mod % n) 0)) (rest candidates)))))))
; Another attempt to speed things up. Instead of sieving-out multiples of all numbers in the range,
; I want to sieve-out only multiples of primes.. I don't like the '(first (filter ' construct very much...
; It doesn't seem to be faster than 'sieve'.
(defn sieve3
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(let [new_candidates (filter #(or (= % n) (not (= (mod % n) 0))) candidates)]
(recur max new_candidates (first (filter #(> % n) new_candidates)))))))
(time (sieve 10000000))
(time (sieve 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve 10000000)) ; Strange, speeds are very different now... Must be some memory allocation thing caused by running sieve2
(time (sieve 10000000))
(time (sieve3 10000000))
(time (sieve3 10000000))
(time (sieve 10000000))
I have good news and bad news. The good news is that your intuitions are correct.
(time (sieve 10000)) ; "Elapsed time: 0.265311 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
(time (sieve2 10000)) ; "Elapsed time: 1.028353 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
The bad news is that both are much slower than you think
(time (count (sieve 10000))) ; "Elapsed time: 231.183055 msecs"
1229
(time (count (sieve2 10000))) ; "Elapsed time: 87.822796 msecs"
1229
What's happening is that because filter is lazy, the filtering isn't getting done until the answers need to be printed. All the first expression is counting is the time to wrap the sequence in a load of filters. Putting the count in means that the sequence actually has to be calculated within the timing expression, and then you see how long it really takes.
I think in the case without the count, sieve2 is taking longer because it is doing a bit of the work whilst constructing the filtered sequence.
When you put the count in, sieve2 is faster because it's the better algorithm.
P.S. When I try (time (sieve 10000000)), my machine crashes with a stack overflow, presumably because of the vast stack of nested filter calls it's building up. How come it ran for you?
Some optimization tips for this kind of Primative number heavy math:
use clojure 1.3
clonjure 1.3 allows un-boxed-checked-arithmetic so you wont be casting everything to Integer.
type hint the function arguments
Otherwise you will end up casting all the Ints/Longs to Integer for each function call. (you're not calling any hint-able functions so i'm just listing it here as general advice)
don't call any higher order functions.
Currently (1.3) lambda functions #( ...) cant be compiled as ^static so they only take Object as arguments. so the calls to filter will require boxing of all the numbers.
You're likely loosing enough time in boxing/unboxing Integers/ints that it will make it hard to really judge the different optimizations. If you type hint (and use clojure 1.3) then you will likely get better numbers to judge your optimizations.