Clojure Docs: Understanding This Example of the Reduce Function - clojure

Reading through the Clojure docs and I'm confused by one example of the reduce function. I understand what reduce does, but there's a lot going on in this example and I'm not sure how it's all working together.
(reduce
(fn [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
[2]
(take 1000 (iterate inc 3)))
=> [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997]
From what I understand, reduce takes a function, in this case an anonymous function. That function takes two arguments, a collection and a number. Then we have a conditional statement that checks if the number zero appears in the collection.
(map (partial mod number) primes) confuses me. Doesn't mod take two arguments and return the remainder of dividing the first by the second?
It appears that if this conditional returns true, it returns the collection of primes. If not, add the number to the collection of primes. Is that correct?
And the final line, it's a collection of 1,000 numbers starting at 3. Would someone be able to walk through this function?

You probably already know this but the first thing to note is the following property: A number is prime if it is indivisible by any prime number smaller than it.
So the anonymous function starts with a vector of previous primes, and then checks if number is divisible by any of the previous primes. If it is divisible by any of the previous primes, it will just pass on the vector of previous primes, otherwise it will add the current number to the vector of primes and then return the new vector.
(partial mod number) is equivalent to (fn [x] (mod number x)) in this case.
To step through a few cases:
;Give a name to the anonymous function
(defn prime-checker [primes number]
(if (some zero? (map (partial mod number) primes))
primes
(conj primes number)))
;This is how reduce will call the anonymous function
(prime-checker [2] 3)
-> ((map (partial mod number) primes) = [1]
-> will return [2 3]
(prime-checker [2 3] 4)
-> ((map (partial mod number) primes) = [0 1]
-> some zero? finds a zero value here so the function will return [2 3]
(prime-checker [2 3] 5)
-> ((map (partial mod number) primes) = [1 2]
-> will return [2 3 5]
Hopefully you can see from this how reduce with this function returns a list of primes.

Related

Compute differences in the prior experience of employees

I have the following dataset:
clear
input float(department employee expertise_area share)
1 56 334 1
1 143 389 .04
1 143 334 .18
1 143 383 .02
1 143 398 .1
1 143 414 .02
1 143 396 .08
1 143 385 .08
1 143 403 .3
1 143 409 .02
1 143 373 .02
1 143 392 .06
1 143 397 .06
1 143 394 .02
1 214 373 1
4 145 399 .029
4 145 409 .7681
4 145 311 .0145
4 145 403 .1884
4 161 62 .4
4 161 373 .6
4 285 355 .5333
4 285 392 .0333
4 285 304 .0333
4 285 310 .2333
4 285 73 .0333
4 285 331 .0333
4 285 399 .0333
4 285 414 .0667
186 161 62 .4
186 161 373 .6
186 247 409 .0025
186 247 311 .0025
186 247 338 .25
186 247 298 .0051
186 247 334 .649
186 247 337 .0051
186 247 404 .0076
186 247 339 .0051
186 247 301 .0025
186 247 403 .0631
186 247 347 .0025
186 247 336 .0051
186 285 304 .0333
186 285 399 .0333
186 285 355 .5333
186 285 392 .0333
186 285 310 .2333
186 285 73 .0333
186 285 414 .0667
186 285 331 .0333
end
I would like to compute the differences between the distribution of the prior experience of employees in a team (or department).
This is the mean Euclidean distance, which calculates that separation of individuals in a team:
Here, p_ij and p_kj are the share of employee i’s or k’s expertise in area j over his career and n equals the team size.
For example, for department 1, employee 143, he has worked 18% of his career on area 334 (this example corresponds to observation 3). The team size for department 1 is 3, that is for department 1, n=3.
In summary, I want to calculate the Euclidean distance for each department (1, 4, 186) for 3 points (or employees) each [(56, 143, 214), (145, 161, 285), (161, 247, 285) respectively] with 13, 13 and 22 dimensions (or expertise_area) respectively. Note that I should be able to produce output even if a department has more than 3 employees (or points).
The output should look as follows:
+------------+--------------------+
| department | euclidean_distance |
+------------+--------------------+
| 1 | .4022 |
| 4 | .4131 |
| 186 | .3882 |
+------------+--------------------+
How can I compute this in Stata?

does this code generate random numbers?

a = 100
for b in range(10,a):
c = b%10
if c == 0:
c += 3
c = c*b
print c
I was trying to make a random generator without using random function and I made this, does it generate random numbers?
Short Answer:
No.
Your code will print
30 11 24 39 56 75 96 119 144 171 60 21 44 69 96 125 156 189 224 261 90 31 64 99 136 175 216 259 304 351 120 41 84 129 176 225 276 329 384 441 150 51 104 159 216 275 336 399 464 531 180 61 124 189 256 325 396 469 544 621 210 71 144 219 296 375 456 539 624 711 240 81 164 249 336 425 516 609 704 801 270 91 184 279 376 475 576 679 784 891
every time.
Computers and programs like these are deterministic. If you sat down with a pen and paper you could tell me exactly which of these number would occur, when they would occur.
Random number generation is difficult, what I would recommend is using time to (seem to) randomize the output.
import time
print int(time.time() % 10)
This will give you a "random" number between 0 and 9.
time.time() gives you the number of milliseconds since (I believe) epoch time. It's a floating point number so we have to cast to an int if we want a "whole" integer number.
Caveat: This solution is not truly random, but will act in a much more "random" fashion.

How to print a list of number, with 5 numbers in each row, in Clojure?

Suppose I have this list of prime (totally 100):
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541)
I want to print them in rows, with 5 numbers each row:
( 2 3 5 7 11
13 17 19 23 29
31 37 41 43 47
....
(Please note that the numbers in column are right aligned.)
This is based on another answer above.
(require '[clojure.string :as string])
user=> (println
(str \(
(string/join "\n "
(map #(apply format "%3d %3d %3d %3d %3d" %)
(partition 5 primes)))
\)))
( 2 3 5 7 11
13 17 19 23 29
31 37 41 43 47
53 59 61 67 71
73 79 83 89 97
101 103 107 109 113
127 131 137 139 149
151 157 163 167 173
179 181 191 193 197
199 211 223 227 229
233 239 241 251 257
263 269 271 277 281
283 293 307 311 313
317 331 337 347 349
353 359 367 373 379
383 389 397 401 409
419 421 431 433 439
443 449 457 461 463
467 479 487 491 499
503 509 521 523 541)

Clojure: decide if argument is a prime

I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not.
Here is my code:
(defn is-prime [n]
(nil?
(some #(= (mod n %) 0)
(range 2 (java.lang.Math/sqrt n)))))
My problem is, that this function returns true when it is called with '4'.
(is-prime 4) => true
I wrote another function for debuggin purposes, it lists all the primes that are less than 250:
(defn primes [] (filter #(is-prime %) (range 1 250)))
I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct.
(primes)
=> (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241)
I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes.
Simplest fix is (range 2 (inc (sqrt n))).

Why is this prime sieve implementation slower?

I was just experimenting a bit with (for me) a new programming language: clojure. And I wrote a quite naive 'sieve' implementation, which I then tried to optimise a bit.
Strangely enough though (for me at least), the new implementation wasn't faster, but much slower...
Can anybody provide some insight in why this is so much slower?
I'm also interested in other tips in how to improve this algorithm...
Best regards,
Arnaud Gouder
; naive sieve.
(defn sieve
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(recur max (filter #(or (= % n) (not (= (mod % n) 0))) candidates) (inc n)))))
; Instead of just passing the 'candidates' list, from which I sieve-out the non-primes,
; I also pass a 'primes' list, with the already found primes
; I hoped that this would increase the speed, because:
; - Instead of sieving-out multiples of 'all' numbers, I now only sieve-out the multiples of primes.
; - The filter predicate now becomes simpler.
; However, this code seems to be approx 20x as slow.
; Note: the primes in 'primes' end up reversed, but I don't care (much). Adding a 'reverse' call makes it even slower :-(
(defn sieve2
([max] (sieve2 max () (range 2 max)))
([max primes candidates]
(let [n (first candidates)]
(if (> (* n n) max)
(concat primes candidates)
(recur max (conj primes n) (filter #(not (= (mod % n) 0)) (rest candidates)))))))
; Another attempt to speed things up. Instead of sieving-out multiples of all numbers in the range,
; I want to sieve-out only multiples of primes.. I don't like the '(first (filter ' construct very much...
; It doesn't seem to be faster than 'sieve'.
(defn sieve3
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(let [new_candidates (filter #(or (= % n) (not (= (mod % n) 0))) candidates)]
(recur max new_candidates (first (filter #(> % n) new_candidates)))))))
(time (sieve 10000000))
(time (sieve 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve 10000000)) ; Strange, speeds are very different now... Must be some memory allocation thing caused by running sieve2
(time (sieve 10000000))
(time (sieve3 10000000))
(time (sieve3 10000000))
(time (sieve 10000000))
I have good news and bad news. The good news is that your intuitions are correct.
(time (sieve 10000)) ; "Elapsed time: 0.265311 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
(time (sieve2 10000)) ; "Elapsed time: 1.028353 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...)
The bad news is that both are much slower than you think
(time (count (sieve 10000))) ; "Elapsed time: 231.183055 msecs"
1229
(time (count (sieve2 10000))) ; "Elapsed time: 87.822796 msecs"
1229
What's happening is that because filter is lazy, the filtering isn't getting done until the answers need to be printed. All the first expression is counting is the time to wrap the sequence in a load of filters. Putting the count in means that the sequence actually has to be calculated within the timing expression, and then you see how long it really takes.
I think in the case without the count, sieve2 is taking longer because it is doing a bit of the work whilst constructing the filtered sequence.
When you put the count in, sieve2 is faster because it's the better algorithm.
P.S. When I try (time (sieve 10000000)), my machine crashes with a stack overflow, presumably because of the vast stack of nested filter calls it's building up. How come it ran for you?
Some optimization tips for this kind of Primative number heavy math:
use clojure 1.3
clonjure 1.3 allows un-boxed-checked-arithmetic so you wont be casting everything to Integer.
type hint the function arguments
Otherwise you will end up casting all the Ints/Longs to Integer for each function call. (you're not calling any hint-able functions so i'm just listing it here as general advice)
don't call any higher order functions.
Currently (1.3) lambda functions #( ...) cant be compiled as ^static so they only take Object as arguments. so the calls to filter will require boxing of all the numbers.
You're likely loosing enough time in boxing/unboxing Integers/ints that it will make it hard to really judge the different optimizations. If you type hint (and use clojure 1.3) then you will likely get better numbers to judge your optimizations.