Let over lambda block-scanner in clojure - clojure

I have just start reading Let over lambda and I thought I would try and write a clojure version of the block-scanner in the closures chapter.
I have the following so far:
(defn block-scanner [trigger-string]
(let [curr (ref trigger-string) trig trigger-string]
(fn [data]
(doseq [c data]
(if (not (empty? #curr))
(dosync(ref-set curr
(if (= (first #curr) c)
(rest #curr)
trig)))))
(empty? #curr))))
(def sc (block-scanner "jihad"))
This works I think, but I would like know what I did right and what I could do better.

I would not use ref-set but alter because you don't reset the state to a completely new value, but update it to a new value which is obtained from the old one.
(defn block-scanner
[trigger-string]
(let [curr (ref trigger-string)
trig trigger-string]
(fn [data]
(doseq [c data]
(when (seq #curr)
(dosync
(alter curr
#(if (-> % first (= c))
(rest %)
trig)))))
(empty? #curr))))
Then it is not necessary to use refs since you don't have to coordinate changes. Here an atom is a better fit, since it can be changed without all the STM ceremony.
(defn block-scanner
[trigger-string]
(let [curr (atom trigger-string)
trig trigger-string]
(fn [data]
(doseq [c data]
(when (seq #curr)
(swap! curr
#(if (-> % first (= c))
(rest %)
trig))))
(empty? #curr))))
Next I would get rid of the imperative style.
it does more than it should: it traverses all data - even if we found a match already. We should stop early.
it is not thread-safe, since we access the atom multiple times - it might change in between. So we must touch the atom only once. (Although this is probably not interesting in this case, but it's good to make it a habit.)
it's ugly. We can do all the work functionally and just save the state, when we come to a result.
(defn block-scanner
[trigger-string]
(let [state (atom trigger-string)
advance (fn [trigger d]
(when trigger
(condp = d
(first trigger) (next trigger)
; This is maybe a bug in the book. The book code
; matches "foojihad", but not "jijihad".
(first trigger-string) (next trigger-string)
trigger-string)))
update (fn [trigger data]
(if-let [data (seq data)]
(when-let [trigger (advance trigger (first data))]
(recur trigger (rest data)))
trigger))]
(fn [data]
(nil? (swap! state update data)))))

Related

clojure java.lang.NullPointerException while spliting string

I am new to clojure. I am trying to write a program which reads data from a file (comma seperated file) after reading the data I am trying to split each line while delimiter "," but I am facing the below error:
CompilerException java.lang.NullPointerException,
compiling:(com\clojure\apps\StudentRanks.clj:26:5)
Here is my code:
(ns com.clojure.apps.StudentRanks)
(require '[clojure.string :as str])
(defn student []
(def dataset (atom []))
(def myList (atom ()))
(def studObj (atom ()))
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(swap! dataset into (reduce conj [] (line-seq rdr)))
)
(println #dataset)
(def studentCount (count #dataset))
(def ind (atom 0))
(loop [n studentCount]
(when (>= n 0)
(swap! myList conj (get #dataset n))
(println (get #dataset n))
(recur (dec n))))
(println myList)
(def scount (count #dataset))
(loop [m scount]
(when (>= m 0)
(def data(get #dataset m))
(println (str/split data #","))
(recur (dec m))))
)
(student)
Thanks in advance.
As pointed out in the comments, the first problem is that you are not writing correct Clojure.
To start, def should never be nested -- it's not going to behave like you hope. Use let to introduce local variables (usually just called locals because it's weird to call variables things that don't vary).
Second, block-like constructs (such as do, let or with-open evaluates to the value of their last expression.
So this snippet
(def dataset (atom []))
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(swap! dataset into (reduce conj [] (line-seq rdr))))
should be written
(let [dataset
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(into [] (line-seq rdr)))]
; code using dataset goes here
)
Then you try to convert dataset (a vector) to a list (myList) by traversing it backwards and consing on the list under construction. It's not needed. You can get a sequence (list-like) out of a vector by just calling seq on it. (Or rseq if you want the list to be reversed.)
Last, you iterate once again to split and print each item held in dataset. Explicit iteration with indices is pretty unusual in Clojure, prefer reduce, doseq, into etc.
Here are two ways to write student:
(defn student [] ; just for print
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(doseq [data (line-seq rdr)]
(println (str/split data #",")))))
(defn student [] ; to return a value
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(into []
(for [data (line-seq rdr)]
(str/split data #",")))))
I hope this will help you to better get Clojure.
I suggest you use a csv library:
(require '[clojure.data.csv :as csv])
(csv/read-csv (slurp "example.txt"))
Unless this is some file io exercise.

Create transducer from ordinary code

How would I create a transducer from the following ordinary code, where combo is the alias for clojure.math.combinatorics:
(defn row->evenly-divided [xs]
(->> (combo/combinations (sort-by - xs) 2)
(some (fn [[big small]]
(assert (>= big small))
(let [res (/ big small)]
(when (int? res)
res))))))
As noted in a comment transducers are only applicable for processing each item. With this is mind I've made the code a little more transducer friendly by shifting the sorting so that it is now being done for each item. I don't think there's anything that can be done about the combinations part however!
(defn row->evenly-divided [xs]
(->> (combo/combinations xs 2)
(some (fn [xy]
(let [res (apply / (sort-by - xy))]
(when (int? res)
res))))))
This is the same function but with an introduced transducer:
(def x-row->evenly-divided (comp
(map (partial sort-by -))
(map (partial apply /))
(filter int?)))
(defn row->evenly-divided-2 [xs]
(->> (combo/combinations xs 2)
(sequence x-row->evenly-divided)
first))

how to separate concerns in the below fn

The function below does 2 things -
Checks if the atom is nil or fetch-agin is true, and then fetches the data.
It processes the data by calling (add-date-strings).
What is a better pattern to separate out the above two concerns ?
(def retrieved-data (atom nil))
(defn fetch-it!
[fetch-again?]
(if (or fetch-again?
(nil? #retrieved-data))
(->> (exec-services)
(map #(add-date-strings (:time %)))
(reset! retrieved-data))
#retrieved-data))
One possible refactoring would be:
(def retrieved-data (atom nil))
(defn fetch []
(->> (exec-services)
(map #(add-date-strings (:time %)))))
(defn fetch-it!
([]
(fetch-it! false))
([force]
(if (or force (nil? #retrieved-data))
(reset! retrieved-data (fetch))
#retrieved-data)))
By the way, the pattern to seperate out concerns is called "functions" :)
To really separate the concerns I think it might be better to define a separate fetch and process function. So that in no way they are complected.
(def retrieved-data (atom nil))
(defn fetcher []
(->> (exec-services)
(map #(add-date-strings (:time %)))))
(defn fetch-again? [force]
(fn [data] (or force (nil? data))))
(defn fetch-it! [fetch-fn data fetch-again?]
(when (fetch-again? #data))
(reset! data (fetch-fn))))
;;Usage
(fetch-it! fetcher retrieved-data (fetch-again? true))
Notice that I also gave the data atom as an argument.

Computing folder size

I'm trying to compute folder size in parallel.
Maybe it's naive approach.
What I do, is that I give computation of every branch node (directory) to an agent.
All leaf nodes have their file sizes added to my-size.
Well it doesn't work. :)
'scan' works ok, serially.
'pscan' prints only files from first level.
(def agents (atom []))
(def my-size (atom 0))
(def root-dir (clojure.java.io/file "/"))
(defn scan [listing]
(doseq [f listing]
(if (.isDirectory f)
(scan (.listFiles f))
(swap! my-size #(+ % (.length f))))))
(defn pscan [listing]
(doseq [f listing]
(if (.isDirectory f)
(let [a (agent (.listFiles f))]
(do (swap! agents #(conj % a))
(send-off a pscan)
(println (.getName f))))
(swap! my-size #(+ % (.length f))))))
Do you have any idea, what have i done wrong?
Thanks.
No need to keep state using atoms. Pure functional:
(defn psize [f]
(if (.isDirectory f)
(apply + (pmap psize (.listFiles f)))
(.length f)))
So counting filesizes in parallel should be so easy?
It's not :)
I tried to solve this issue better. I realized that i'm doing blocking I/O operations so pmap doesn't do the job.
I was thinking maybe giving chunks of directories (branches) to agents to process it independently would make sense. Looks it does :)
Well I haven't benchmarked it yet.
It works, but, there might be some problems with symbolic links on UNIX-like systems.
(def user-dir (clojure.java.io/file "/home/janko/projects/"))
(def root-dir (clojure.java.io/file "/"))
(def run? (atom true))
(def *max-queue-length* 1024)
(def *max-wait-time* 1000) ;; wait max 1 second then process anything left
(def *chunk-size* 64)
(def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length* ))
(def agents (atom []))
(def size-total (atom 0))
(def a (agent []))
(defn branch-producer [node]
(if #run?
(doseq [f node]
(when (.isDirectory f)
(do (.put queue f)
(branch-producer (.listFiles f)))))))
(defn producer [node]
(future
(branch-producer node)))
(defn node-consumer [node]
(if (.isFile node)
(.length node)
0))
(defn chunk-length []
(min (.size queue) *chunk-size*))
(defn compute-sizes [a]
(doseq [i (map (fn [f] (.listFiles f)) a)]
(swap! size-total #(+ % (apply + (map node-consumer i))))))
(defn consumer []
(future
(while #run?
(when-let [size (if (zero? (chunk-length))
false
(chunk-length))] ;appropriate size of work
(binding [a (agent [])]
(dotimes [_ size] ;give us all directories to process
(when-let [item (.poll queue)]
(set! a (agent (conj #a item)))))
(swap! agents #(conj % a))
(send-off a compute-sizes))
(Thread/sleep *max-wait-time*)))))
You can start it by typing
(producer (list user-dir))
(consumer)
For result type
#size-total
You can stop it by (there are running futures - correct me if I'm wrong)
(swap! run? not)
If you find any errors/mistakes, you're welcome to share your ideas!

How do I write a predicate that checks if a value exists in an infinite seq?

I had an idea for a higher-order function today that I'm not sure how to write. I have several sparse, lazy infinite sequences, and I want to create an abstraction that lets me check to see if a given number is in any of these lazy sequences. To improve performance, I wanted to push the values of the sparse sequence into a hashmap (or set), dynamically increasing the number of values in the hashmap whenever it is necessary. Automatic memoization is not the answer here due to sparsity of the lazy seqs.
Probably code is easiest to understand, so here's what I have so far. How do I change the following code so that the predicate uses a closed-over hashmap, but if needed increases the size of the hashmap and redefines itself to use the new hashmap?
(defn make-lazy-predicate [coll]
"Returns a predicate that returns true or false if a number is in
coll. Coll must be an ordered, increasing lazy seq of numbers."
(let [in-lazy-list? (fn [n coll top cache]
(if (> top n)
(not (nil? (cache n)))
(recur n (next coll) (first coll)
(conj cache (first coll)))]
(fn [n] (in-lazy-list? n coll (first coll) (sorted-set)))))
(def my-lazy-list (iterate #(+ % 100) 1))
(let [in-my-list? (make-lazy-predicate my-lazy-list)]
(doall (filter in-my-list? (range 10000))))
How do I solve this problem without reverting to an imperative style?
This is a thread-safe variant of Adam's solution.
(defn make-lazy-predicate
[coll]
(let [state (atom {:mem #{} :unknown coll})
update-state (fn [{:keys [mem unknown] :as state} item]
(let [[just-checked remainder]
(split-with #(<= % item) unknown)]
(if (seq just-checked)
(-> state
(assoc :mem (apply conj mem just-checked))
(assoc :unknown remainder))
state)))]
(fn [item]
(get-in (if (< item (first (:unknown #state)))
#state
(swap! state update-state item))
[:mem item]))))
One could also consider using refs, but than your predicate search might get rolled back by an enclosing transaction. This might or might not be what you want.
This function is based on the idea how the core memoize function works. Only numbers already consumed from the lazy list are cached in a set. It uses the built-in take-while instead of doing the search manually.
(defn make-lazy-predicate [coll]
(let [mem (atom #{})
unknown (atom coll)]
(fn [item]
(if (< item (first #unknown))
(#mem item)
(let [just-checked (take-while #(>= item %) #unknown)]
(swap! mem #(apply conj % just-checked))
(swap! unknown #(drop (count just-checked) %))
(= item (last just-checked)))))))