I was just experimenting a bit with (for me) a new programming language: clojure. And I wrote a quite naive 'sieve' implementation, which I then tried to optimise a bit.
Strangely enough though (for me at least), the new implementation wasn't faster, but much slower...
Can anybody provide some insight in why this is so much slower?
I'm also interested in other tips in how to improve this algorithm...
Best regards,
Arnaud Gouder
; naive sieve.
(defn sieve
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(recur max (filter #(or (= % n) (not (= (mod % n) 0))) candidates) (inc n)))))
; Instead of just passing the 'candidates' list, from which I sieve-out the non-primes,
; I also pass a 'primes' list, with the already found primes
; I hoped that this would increase the speed, because:
; - Instead of sieving-out multiples of 'all' numbers, I now only sieve-out the multiples of primes.
; - The filter predicate now becomes simpler.
; However, this code seems to be approx 20x as slow.
; Note: the primes in 'primes' end up reversed, but I don't care (much). Adding a 'reverse' call makes it even slower :-(
(defn sieve2
([max] (sieve2 max () (range 2 max)))
([max primes candidates]
(let [n (first candidates)]
(if (> (* n n) max)
(concat primes candidates)
(recur max (conj primes n) (filter #(not (= (mod % n) 0)) (rest candidates)))))))
; Another attempt to speed things up. Instead of sieving-out multiples of all numbers in the range,
; I want to sieve-out only multiples of primes.. I don't like the '(first (filter ' construct very much...
; It doesn't seem to be faster than 'sieve'.
(defn sieve3
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(let [new_candidates (filter #(or (= % n) (not (= (mod % n) 0))) candidates)]
(recur max new_candidates (first (filter #(> % n) new_candidates)))))))
(time (sieve 10000000))
(time (sieve 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve 10000000)) ; Strange, speeds are very different now... Must be some memory allocation thing caused by running sieve2
(time (sieve 10000000))
(time (sieve3 10000000))
(time (sieve3 10000000))
(time (sieve 10000000))
I have good news and bad news. The good news is that your intuitions are correct.
(time (sieve 10000)) ; "Elapsed time: 0.265311 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
(time (sieve2 10000)) ; "Elapsed time: 1.028353 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
The bad news is that both are much slower than you think
(time (count (sieve 10000))) ; "Elapsed time: 231.183055 msecs"
1229
(time (count (sieve2 10000))) ; "Elapsed time: 87.822796 msecs"
1229
What's happening is that because filter is lazy, the filtering isn't getting done until the answers need to be printed. All the first expression is counting is the time to wrap the sequence in a load of filters. Putting the count in means that the sequence actually has to be calculated within the timing expression, and then you see how long it really takes.
I think in the case without the count, sieve2 is taking longer because it is doing a bit of the work whilst constructing the filtered sequence.
When you put the count in, sieve2 is faster because it's the better algorithm.
P.S. When I try (time (sieve 10000000)), my machine crashes with a stack overflow, presumably because of the vast stack of nested filter calls it's building up. How come it ran for you?
Some optimization tips for this kind of Primative number heavy math:
use clojure 1.3
clonjure 1.3 allows un-boxed-checked-arithmetic so you wont be casting everything to Integer.
type hint the function arguments
Otherwise you will end up casting all the Ints/Longs to Integer for each function call. (you're not calling any hint-able functions so i'm just listing it here as general advice)
don't call any higher order functions.
Currently (1.3) lambda functions #( ...) cant be compiled as ^static so they only take Object as arguments. so the calls to filter will require boxing of all the numbers.
You're likely loosing enough time in boxing/unboxing Integers/ints that it will make it hard to really judge the different optimizations. If you type hint (and use clojure 1.3) then you will likely get better numbers to judge your optimizations.
Related
a = 100
for b in range(10,a):
c = b%10
if c == 0:
c += 3
c = c*b
print c
I was trying to make a random generator without using random function and I made this, does it generate random numbers?
Short Answer:
No.
Your code will print
30 11 24 39 56 75 96 119 144 171 60 21 44 69 96 125 156 189 224 261 90 31 64 99 136 175 216 259 304 351 120 41 84 129 176 225 276 329 384 441 150 51 104 159 216 275 336 399 464 531 180 61 124 189 256 325 396 469 544 621 210 71 144 219 296 375 456 539 624 711 240 81 164 249 336 425 516 609 704 801 270 91 184 279 376 475 576 679 784 891
every time.
Computers and programs like these are deterministic. If you sat down with a pen and paper you could tell me exactly which of these number would occur, when they would occur.
Random number generation is difficult, what I would recommend is using time to (seem to) randomize the output.
import time
print int(time.time() % 10)
This will give you a "random" number between 0 and 9.
time.time() gives you the number of milliseconds since (I believe) epoch time. It's a floating point number so we have to cast to an int if we want a "whole" integer number.
Caveat: This solution is not truly random, but will act in a much more "random" fashion.
Is there an idomatic way of take some items from a collection?
Here is how I did:
(time (drop 30 (take 70 (range 10001))))
;> "Elapsed time: 0.049797 msecs"
;> (30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69)
(time (subvec (vec (range 10001)) 30 70))
;> "Elapsed time: 2.072258 msecs"
;> [30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69]
Question:
Why the subvec method is slower than the take & drop approach?
What's the idiomatic way of doing this?
vec isn't lazy, so you are creating an entire array of 10001 items, then taking a sub vector of it, whereas drop/take/range are lazy so only supply the items you need. You don't do anything with the last (10001-70) items, so they aren't created/used and thus take no time
your first version is idiomatic enough for what you're doing.
Your comparison wasn't carried out in a proper way.
A single time is not enough to compare two code blocks. You should use dotimes to repeat them for a number of times so that the difference of execution times are more reliable:
;; returns the time it took to repeat running the code 1000 times
(time (dotimes [i 1000] your-code-block))
In your second code block, you convert the lazy sequence returned by range to vector with vec which took some extra time, too:
(vec (range 10001))
You can use the time + dotimes technique to compare the above with (range 10001) itself.
I hope this will be a foundation for your further exploration.
I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not.
Here is my code:
(defn is-prime [n]
(nil?
(some #(= (mod n %) 0)
(range 2 (java.lang.Math/sqrt n)))))
My problem is, that this function returns true when it is called with '4'.
(is-prime 4) => true
I wrote another function for debuggin purposes, it lists all the primes that are less than 250:
(defn primes [] (filter #(is-prime %) (range 1 250)))
I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct.
(primes)
=> (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241)
I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes.
Simplest fix is (range 2 (inc (sqrt n))).
Reading through the Clojure docs and I'm confused by one example of the reduce function. I understand what reduce does, but there's a lot going on in this example and I'm not sure how it's all working together.
(reduce
(fn [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
[2]
(take 1000 (iterate inc 3)))
=> [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997]
From what I understand, reduce takes a function, in this case an anonymous function. That function takes two arguments, a collection and a number. Then we have a conditional statement that checks if the number zero appears in the collection.
(map (partial mod number) primes) confuses me. Doesn't mod take two arguments and return the remainder of dividing the first by the second?
It appears that if this conditional returns true, it returns the collection of primes. If not, add the number to the collection of primes. Is that correct?
And the final line, it's a collection of 1,000 numbers starting at 3. Would someone be able to walk through this function?
You probably already know this but the first thing to note is the following property: A number is prime if it is indivisible by any prime number smaller than it.
So the anonymous function starts with a vector of previous primes, and then checks if number is divisible by any of the previous primes. If it is divisible by any of the previous primes, it will just pass on the vector of previous primes, otherwise it will add the current number to the vector of primes and then return the new vector.
(partial mod number) is equivalent to (fn [x] (mod number x)) in this case.
To step through a few cases:
;Give a name to the anonymous function
(defn prime-checker [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
;This is how reduce will call the anonymous function
(prime-checker [2] 3)
-> ((map (partial mod number) primes) = [1]
-> will return [2 3]
(prime-checker [2 3] 4)
-> ((map (partial mod number) primes) = [0 1]
-> some zero? finds a zero value here so the function will return [2 3]
(prime-checker [2 3] 5)
-> ((map (partial mod number) primes) = [1 2]
-> will return [2 3 5]
Hopefully you can see from this how reduce with this function returns a list of primes.
user=> (.. Runtime getRuntime availableProcessors)
2
And evaluating this example: http://clojuredocs.org/clojure_core/clojure.core/pmap#example_684 I get
user=> (time (doall (map long-running-job (range 4))))
"Elapsed time: 12000.621 msecs"
(10 11 12 13)
user=> (time (doall (pmap long-running-job (range 5))))
"Elapsed time: 3000.454 msecs"
(10 11 12 13 14)
user=> (time (doall (pmap long-running-job (range 32))))
"Elapsed time: 3014.969 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 3839 40 41)
user=> (time (doall (pmap long-running-job (range 33))))
"Elapsed time: 6001.526 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42)
I wonder why I have to pass 33 to wait 33 sec. for result. pmap create 2 (available processors) + 2 threads, yes? I suppose that when pass (range 5) it will be executed in 6 sec. Why it is different?
Actually pmap does not obey the "processors + 2" limit. That's a result of the ways in which the regular map and the future macro work:
future uses a cached thread pool which has no size limit;
map produces a chunked sequence, that is, one which is always forced 32 elements at a time, even if only a handful at the beginning of a chunk are actually consumed by the caller.
The ultimate result is that futures in pmap are launched in parallel in blocks of 32.
Note that this is not in violation of the contract specified in pmap's docstring. The code, on the other hand, might lead one to believe that it was intended that the "processors + 2" limit be respected -- as it would if map was written naively. In fact, pmap might well predate the move to chunked seqs, although I'm not really sure, it's been a while.