Recursive function has bad performance - clojure

I am learning clojure and I am doing some exercises to practice. Currently I am working on the next problem:
The Elves contact you over a highly secure emergency channel. Back at
the North Pole, the Elves are busy misunderstanding White Elephant
parties. Each Elf brings a present. They all sit in a circle, numbered
starting with position 1. Then, starting with the first Elf, they take
turns stealing all the presents from the Elf to their left. An Elf
with no presents is removed from the circle and does not take turns.
For example, with five Elves (numbered 1 to 5):
     1
5         2
   4  3
Elf 1 takes Elf 2's present. Elf 2 has no presents and is skipped. Elf
3 takes Elf 4's present. Elf 4 has no presents and is also skipped.
Elf 5 takes Elf 1's two presents. Neither Elf 1 nor Elf 2 have any
presents, so both are skipped. Elf 3 takes Elf 5's three presents. So,
with five Elves, the Elf that sits starting in position 3 gets all the
presents.
With the number of Elves given in your puzzle input, which Elf gets
all the presents?
My code:
(ns elfos-navidad.core (:gen-class))
(def elfs (range 8977))
(defn remove-elf [elfs]
(if (= (count elfs) 1)
elfs
(let [[first second & rest] elfs]
(recur (concat rest (list first))))))
(defn -main [& args]
(time (println (+ 1 (first (remove-elf elfs))))))
My Snippets works correctly, but it is so slow. Same problem in nodejs is extremely faster.
PS: I know that the best way to solve this problem is with linked list, but I don't know how to do it with clojure yet

the problem is that concat is inefficient here. There is a better way to add items to the end of collection, if you use vectors:
(defn remove-elf-v [elfs]
(let [elfs (vec elfs)]
(if (= (count elfs) 1)
elfs
(recur (conj (subvec elfs 2) (first elfs))))))
user> (time (println (+ 1 (first (remove-elf-v elfs)))))
;;=> 1571
;;=> "Elapsed time: 20.128779 msecs"
the old one:
user> (time (println (+ 1 (first (remove-elf elfs)))))
;;=> 1571
;;=> "Elapsed time: 3482.966943 msecs"

Related

Clojure and Leiningen: Why is `doall` needed in this example?

I'm currently using Leiningen to learn Clojure and I'm confused about the requirement of doall for running this function:
;; take a function (for my purposes, println for output formatting) and repeat it n times
(defn repeat [f n] (for [i (range n)] (f)))
(repeat println 2) works just fine in an REPL session, but not when running with lein run unless I include the doall wrapper. (doall (repeat println 2)) works and I'm curious why. Without it, lein run doesn't show the two blank lines in the output.
I also have:
(defmacro newPrint1 [& args] `'(println ~args))
(defmacro newPrint2 [& args] `'(println ~#args))
The first function I thought of myself. The next two macros are examples from a tutorial video I'm following on Udemy. Even if I wrap the macro calls with doall, such as (doall (newPrint1 1 2 3)), lein run produces no output, but (newPrint1 1 2 3) in a terminal REPL session produces the desired output of (clojure.core/println (1 2 3)) as it does in the video tutorial. Why doesn't doall work here?
for creates a lazy sequence. This lazy sequence is returned. The P in REPL (read eval Print loop) prints the sequence, thus realizing it. For realizing, the code to produce each element is run.
If you do not use the sequence, it is not realized, so the code is never run. In non-interactive use, this is likely the case. As noted, doall forces realization.
If you want to do side-effects, doseq is often better suited than for.

Why is pmap|reducers/map not using all cpu cores?

I'm trying to parse a file with a million lines, each line is a json string with some information about a book (author, contents etc). I'm using iota to load the file, as my program throws an OutOfMemoryError if I try to use slurp. I'm also using cheshire to parse the strings. The program simply loads a file and counts all the words in all books.
My first attempt included pmap to do the heavy work, I figured this would essentially make use of all my cpu cores.
(ns multicore-parsing.core
(:require [cheshire.core :as json]
[iota :as io]
[clojure.string :as string]
[clojure.core.reducers :as r]))
(defn words-pmap
[filename]
(letfn [(parse-with-keywords [str]
(json/parse-string str true))
(words [book]
(string/split (:contents book) #"\s+"))]
(->>
(io/vec filename)
(pmap parse-with-keywords)
(pmap words)
(r/reduce #(apply conj %1 %2) #{})
(count))))
While it does seem to use all cores, each core rarely uses more than 50% of its capacity, my guess is that it has to do with batch size of pmap and so I stumbled across relatively old question where some comments make reference to the clojure.core.reducers library.
I decided to rewrite the function using reducers/map:
(defn words-reducers
[filename]
(letfn [(parse-with-keywords [str]
(json/parse-string str true))
(words [book]
(string/split (:contents book) #"\s+"))]
(->>
(io/vec filename)
(r/map parse-with-keywords)
(r/map words)
(r/reduce #(apply conj %1 %2) #{})
(count))))
But the cpu usage is worse, and it even takes longer to finish compared to the previous implementation:
multicore-parsing.core=> (time (words-pmap "./dummy_data.txt"))
"Elapsed time: 20899.088919 msecs"
546
multicore-parsing.core=> (time (words-reducers "./dummy_data.txt"))
"Elapsed time: 28790.976455 msecs"
546
What am I doing wrong? Is mmap loading + reducers the correct approach when parsing a large file?
EDIT: this is the file I'm using.
EDIT2: Here are the timings with iota/seq instead of iota/vec:
multicore-parsing.core=> (time (words-reducers "./dummy_data.txt"))
"Elapsed time: 160981.224565 msecs"
546
multicore-parsing.core=> (time (words-pmap "./dummy_data.txt"))
"Elapsed time: 160296.482722 msecs"
546
I don't believe that reducers are going to be the right solution for you, as they don't cope with lazy sequences at all well (a reducer will give correct results with a lazy sequence, but won't parallelise well).
You might want to take a look at this sample code from the book Seven Concurrency Models in Seven Weeks (disclaimer: I am the author) which solves a similar problem (counting the number of times each word appears on Wikipedia).
Given a list of Wikipedia pages, this function counts the words sequentially (get-words returns a sequence of words from a page):
(defn count-words-sequential [pages]
(frequencies (mapcat get-words pages)))
This is a parallel version using pmap which does run faster, but only around 1.5x faster:
(defn count-words-parallel [pages]
(reduce (partial merge-with +)
(pmap #(frequencies (get-words %)) pages)))
The reason it only goes around 1.5x faster is because the reduce becomes a bottleneck—it's calling (partial merge-with +) once for each page. Merging batches of 100 pages improves the performance to around 3.2x on a 4-core machine:
(defn count-words [pages]
(reduce (partial merge-with +)
(pmap count-words-sequential (partition-all 100 pages))))

Best way to read contents of file into a set in Clojure

I'm learning Clojure and as an exercise I wanted to write something like the unix "comm" command.
To do this, I read the contents of each file into a set, then use difference/intersection to show exclusive/common files.
After a lot of repl-time I came up with something like this for the set creation part:
(def contents (ref #{}))
(doseq [line (read-lines "/tmp/a.txt")]
(dosync (ref-set contents (conj #contents line))))
(I'm using duck-streams/read-lines to seq the contents of the file).
This is my first stab at any kind of functional programming or lisp/Clojure. For instance, I couldn't understand why, when I did a conj on the set, the set was still empty. This lead me to learning about refs.
Is there a better Clojure/functional way to do this? By using ref-set, am I just twisting the code to a non-functional mindset or is my code along the lines of how it should be done?
Is there a a library that already does this? This seems like a relatively ordinary thing to want to do but I couldn't find anything like it.
Clojure 1.3:
user> (require '[clojure.java [io :as io]])
nil
user> (line-seq (io/reader "foo.txt"))
("foo" "bar" "baz")
user> (into #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
line-seq gives you a lazy sequence where each item in the sequence is a line in the file.
into dumps it all into a set. To do what you were trying to do (add each item one by one into a set), rather than doseq and refs, you could do:
user> (reduce conj #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
Note that the Unix comm compares two sorted files, which is likely a more efficient way to compare files than doing set intersection.
Edit: Dave Ray is right, to avoid leaking open file handles it's better to do this:
user> (with-open [f (io/reader "foo.txt")]
(into #{} (line-seq f)))
#{"foo" "bar" "baz"}
I always read with slurp and after that split with re-seq due to my needs.

Treating a file of Java floats as a lazy Clojure sequence

What would be an ideomatic way in Clojure to get a lazy sequence over a file containing float values serialized from Java? (I've toyed with a with-open approach based on line-reading examples but cannot seem to connect the dots to process the stream as floats.)
Thanks.
(defn float-seqs [#^java.io.DataInputStream dis]
(lazy-seq
(try
(cons (.readFloat dis) (float-seqs dis))
(catch java.io.EOFException e
(.close dis)))))
(with-open [dis (-> file java.io.FileInputStream. java.io.DataInputStream.)]
(let [s (float-seqs dis)]
(doseq [f s]
(println f))))
You are not required to use with-open if you are sure you are going to consume the whole seq.
If you use with-open, double-check that you're not leaking the seq (or a derived seq) outside of its scope.

Whats wrong with this Clojure program?

I recently started reading Paul Grahams 'On Lisp', and learning learning clojure along with it, so there's probably some really obvious error in here, but I can't see it: (its a project euler problem, obviously)
(ns net.projecteuler.problem31)
(def paths (ref #{}))
; apply fun to all elements of coll for which pred-fun returns true
(defn apply-if [pred-fun fun coll]
(apply fun (filter pred-fun coll)))
(defn make-combination-counter [coin-values]
(fn recurse
([sum] (recurse sum 0 '()))
([max-sum current-sum coin-path]
(if (= max-sum current-sum)
; if we've recursed to the bottom, add current path to paths
(dosync (ref-set paths (conj #paths (sort coin-path))))
; else go on recursing
(apply-if (fn [x] (<= (+ current-sum x) max-sum))
(fn [x] (recurse max-sum (+ x current-sum) (cons x coin-path)))
coin-values)))))
(def count-currency-combinations (make-combination-counter '(1 2 5 10 20 50 100 200)))
(count-currency-combinations 200)
When I run the last line in the REPL, i get the error:
<#CompilerException java.lang.IllegalArgumentException: Wrong number of args passed to: problem31$eval--25$make-combination-counter--27$recurse--29$fn (NO_SOURCE_FILE:0)>
Apart from the question where the error is, the more interesting question would be: How would one debug this? The error message isn't very helpful, and I haven't found a good way to single-step clojure code, and I can't really ask on stack overflow every time I have a problem.
Three tips that might make your life easier here:
Wrong number of args passed to: problem31$eval--25$make-combination-counter--27$recurse--29$fn (NO_SOURCE_FILE:0)>
Tells you roughly where the error occurred: $fn at the end there means anonymous function and it tells you it was declared inside recurse, which was declared inside make-combination-counter. There are two anonymous functions to choose from.
If you save your source-code in a file and execute it as a script it will give you a full stack trace with the line numbers in the file.
at net.projecteuler.problem31$apply_if__9.invoke(problem31.clj:7)
Note you can also examine the last exception and stack trace from within the REPL by examining *e eg: (.stackTrace *e) The stack trace is at first quite daunting because it throws up all the Java internals. You need to learn to ignore those and just look for the lines that refer to your code. This is pretty easy in your case as they all start with net.projecteuler
You can name your anonymous functions to help more quickly identify them:
(fn check-max [x] (<= (+ current-sum x) max-sum))
In your case using all this info you can see that apply-if is being passed a single argument function as fun. Apply does this (f [1 2 3]) -> (f 1 2 3). From your comment what you want is map. (map f [1 2 3]) -> (list (f 1) (f 2) (f 3)). When I replace apply with map the program seems to work.
Finally, if you want to examine values you might want to look into clojure-contrib.logging which has some helpers to this effect. There is a spy macro which allows you to wrap an expression, it will return exactly the same expression so it does not affect the result of your function but will print out EXPR = VALUE, which can be handy. Also on the group various people have posted full tracing solutions. And there is always the trusty println. But the key skill here is being able to identify precisely what blew up. Once you know that it is usually clear why, but sometimes printouts are needed when you can't tell what the inputs are.
dont have a REPL on me though it looks like:
(defn apply-if [pred-fun fun coll]
(apply fun (filter pred-fun coll)))
takes a list like '(1 2 3 4 5) filters some of them out '(1 3 5)
and then creates a function call like (fun 1 3 5)
and it looks like it is being called (apply-if (fn [x] with a function that wants to receive a list of numbers as a single argument.
you could change the apply-if function to just pass call to the fun (with out the apply) or you could change the call to it to take a function that takes an arbitrary number of arguments.