Clojure code, tinkering with :main - clojure

I'm trying to run the following code.
Below are the steps I took:
$ lein new app latinsq
I then modified project.clj as follows:
(defproject latinsq "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.8.0"]]
:main latinsq.core
:target-path "target/%s"
:profiles {:uberjar {:aot :all}})
and /latinsq/src/latinsq/core.clj
(ns latinsq.core
(:use [clojure.set :only (difference)]))
(defn replace-at
"in string s, replaces character at index p with c"
[s p c]
(str
(.substring s 0 p)
c
(.substring s (inc p))))
; memoized function to save time on sqrt calls
(def sqrt (memoize (fn [x] (Math/sqrt x))))
(defn candidates
"assuming that the position pos is empty in sq, return those values that it might be"
[pos sq]
(let [sq-size (int (sqrt (count sq)))]
; set difference between...
(difference
; ...all the possible values...
(set (map #(first (str %)) (range 1 (inc sq-size))))
; ...and the set of...
(into #{}
(concat
; ...those in the same column...
(map #(get sq %)
(range (rem pos sq-size)
(count sq)
sq-size))
; ...and those in the same row.
(map #(get sq %)
(map #(+ % (- pos (rem pos sq-size)))
(range 0 sq-size))))))))
(defn latinsq
"encode your partial-square as a string like 1--1
this fn returns a lazy sequence of all solutions"
[sq]
; Find the first empty square
(let [empty-pos (.indexOf sq "-")]
; if none, we don't need to do anything
(if (= -1 empty-pos)
(list sq)
; else make a lazy sequence of...
(lazy-seq
; ...the concatenation of all the results of...
(apply concat
; ...using "map" to recurse, filling in the empty
; square with...
(map #(latinsq (replace-at sq empty-pos %))
; ...each possible value in turn
(candidates empty-pos sq)))))))
;; So, now some examples
(time
(latinsq "123------"))
;; "Elapsed time: 0.045368 msecs"
;; ("123231312" "123312231")
(time
(latinsq "12---31--------4"))
;; "Elapsed time: 0.068511 msecs"
;; ("1243431224313124" "1243431234212134")
;; A bit harder, an empty 5x5 grid
;; has 161280 solutions according to
;; http://mathworld.wolfram.com/LatinSquare.html
(time
(count (latinsq "-------------------------")))
;; "Elapsed time: 36980.759177 msecs" <--- a bit slow
;; 161280
;; Having made sure that our function returns a lazy seq
;; the whole result can be treated lazily, so to find just one
;; solution to the 5x5 grid:
(time
(first (latinsq "-------------------------")))
;; "Elapsed time: 0.985559 msecs" <--- not slow
;; "1234521453345124523153124"
;; first 3 of 5524751496156892842531225600 solutions for a 9x9 grid
(time
(take 3 (latinsq "---------------------------------------------------------------------------------")))
;; "Elapsed time: 0.075874 msecs"
;; ("123456789214365897341278956432189675567891234658917342789523461896742513975634128"
;; "123456789214365897341278956432189675567891234658917342789523461975634128896742513"
;; "123456789214365897341278956432189675567891234658917342789524163896743521975632418")
I'm getting the following error:
Exception in thread "main" java.lang.Exception: Cannot find anything to run for: latinsq.core, compiling:(/tmp/form-init4810859530587029884.clj:1:73)
at clojure.lang.Compiler.load(Compiler.java:7391)
at clojure.lang.Compiler.loadFile(Compiler.java:7317)
at clojure.main$load_script.invokeStatic(main.clj:275)
at clojure.main$init_opt.invokeStatic(main.clj:277)
at clojure.main$init_opt.invoke(main.clj:277)
at clojure.main$initialize.invokeStatic(main.clj:308)
at clojure.main$null_opt.invokeStatic(main.clj:342)
at clojure.main$null_opt.invoke(main.clj:339)
at clojure.main$main.invokeStatic(main.clj:421)
at clojure.main$main.doInvoke(main.clj:384)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:383)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.main.main(main.java:37)
Caused by: java.lang.Exception: Cannot find anything to run for: latinsq.core
at user$eval5.invokeStatic(form-init4810859530587029884.clj:1)
at user$eval5.invoke(form-init4810859530587029884.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6927)
at clojure.lang.Compiler.eval(Compiler.java:6917)
at clojure.lang.Compiler.load(Compiler.java:7379)
... 14 more
From the research I have done, the issue is caused by the name on :main. I've tinkered with this a few times and I have not been able to get this to work. Is this because "latinsq" is showing up too many times in the directory tree? Or, am I misunderstanding how to run the code.
Alternatively, when I finally get it to run, the only output is:
"Elapsed time: 0.468492 msecs"
"Elapsed time: 0.0796 msecs"

You forgot to define -main function - this one is automatically generated by leiningen in core namespace.
You also didn't specify how you try to run the application, but I assume you just invoke lein run.
As long as you add -main function to your namespace it should work.
I also recommend wrapping the latinsq function invocations into another function to avoid evaluating them when namespace is loaded.
Btw. this is the full output that I got (using unmodified -main generated by leiningen:
lein run
"Elapsed time: 0.183692 msecs"
"Elapsed time: 0.055872 msecs"
"Elapsed time: 68742.261628 msecs"
"Elapsed time: 1.361745 msecs"
"Elapsed time: 0.045366 msecs"
Hello, World!

Related

Memoize doesn't work for fibonacci example

(time (fib 30))
;; "Elapsed time: 8179.04028 msecs"
;; Fibonacci number with recursion and memoize.
(def m-fib
(memoize (fn [n]
(condp = n
0 1
1 1
(+ (m-fib (dec n)) (m-fib (- n 2)))))))
(time (m-fib 30))
;; "Elapsed time: 1.282557 msecs"
This the example code from https://clojuredocs.org/clojure.core/memoize, but when I run it on my browser, the time doesn't change at all. I am wondering why it is?
I don't know what browser-based REPL you're using, but the left-hand column shows the result of the function call, not the time, which is why both values are the same. Both functions compute the same Fibonacci number.
The time macro is designed to wrap arbitrary expressions without having to change the structure of the code, so it returns the same value that's returned by the wrapped expression and prints the execution time to standard out. Essentially, the expression (time (m-fib 30)) is expanded to
(let [start (. System (nanoTime))
ret (m-fib 30)]
(println (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start)) 1000000.0) " msecs"))
ret)
So to see the execution time in your REPL, you'll need to see the printed output of an expression.

How to make this Clojure code faster?

Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.
Here's what I got so far:
(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))
Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':
Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000
There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).
I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.
If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.
For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.
Code for the above (requires [tupelo "0.9.69"] ):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~#forms)
`(time ~#forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.

Largest collections first

I have a list of lists. And I want the biggest lists to come at the beginning. This works, but can take a long time:
(reverse (sort-by count coll))
What is a more efficient way of doing this, presumably in one go?
thank you galdre for pointing out my error
don't use lazy sequences
a quick demo:
user> (let [xs (doall(repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 29.393886 msecs"
1000000
user> (let [xs (into [] (repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 0.013346 msecs"
1000000

PI Approximation: Why is my declarative version slower?

I'm approximating PI using the series:
The function for the series then looks like this:
(defn- pi-series [k]
(/ (if (even? (inc k)) 1 -1)
(dec (* 2 k))))
And then my series generator looks like *:
(defn pi [n]
(* 4
(loop [k 1
acc 0]
(if (= k (inc n))
acc
(recur (inc k)
(+ acc (double (pi-series k))))))))
Running pi with the value 999,999 produces the following:
(time (pi 999999))
;;=> "Elapsed time: 497.686 msecs"
;;=> 3.1415936535907734
That looks great, but I realize pi could be written more declarative. Here's what I ended up with:
(defn pi-fn [n]
(* 4 (reduce +
(map #(double (pi-series %))
(range 1 (inc n))))))
Which resulted in the following:
(time (pi-fn 999999))
;;=> "Elapsed time: 4431.626 msecs"
;;=> 3.1415936535907734
NOTE: The declarative version took around 4-seconds longer. Why?
Why is the declarative version so much slower? How can I update the declarative version to make it as fast as the imperative version?
I'm casting the result of pi-series to a double, because using clojure's ratio types performed a lot slower.
By the way, you can express an alternating finite sum as a difference of two sums, eliminating the need to adjust each term for sign individually. For example,
(defn alt-sum [f n]
(- (apply + (map f (range 1 (inc n) 2)))
(apply + (map f (range 2 (inc n) 2)))))
(time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
; "Elapsed time: 195.244047 msecs"
;= 3.141593653590707
On my laptop pi runs at 2500 msec. However, pi and pi-fn (either version) run at approx. the same rate (10x slower than alt-sum). More often than not, pi-fn is faster than pi. Are you sure you didn't accidentally insert an extra 9 before the second timing? Contra Juan, I do not think you're iterating over the sequence more than once, since the terms are generated lazily.
scratch.core> (time (pi 999999))
"Elapsed time: 2682.86669 msecs"
3.1415936535907734
scratch.core> (time (pi-fn 999999))
"Elapsed time: 2082.071798 msecs"
3.1415936535907734
scratch.core> (time (pi-fn-juan 999999))
"Elapsed time: 1934.976217 msecs"
3.1415936535907734
scratch.core> (time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
"Elapsed time: 199.998438 msecs"
3.141593653590707

Non-linear slowdown creating a lazy seq in Clojure

I implemented a function that returns the n-grams of a given input collection as a lazy seq.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
When I call this function with larger input collections, I would expect to see a linear increase in execution time. However, the timing I observe is worse than that:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
corpus is a Cons of string tokens.
What causes this behavior and how can I improve the performance?
I think your issue is with "(count coll)", which is iterating over the coll for each call to ngrams.
The solution would be to use the build in partition function:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Have a look in the partition source if curious about the implementation http://clojuredocs.org/clojure_core/clojure.core/partition
I am far from a Clojure expert, but I think the cons function causes this problem.
Try to use list instead:
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (list (take n coll) (gen-ngrams n (rest coll))))))
I think cons construct a new seq which is more generic than a list, and therefore is slower.
Edit: and if "corpus is a Cons of string tokens", then try to make it a list...