Bayesian Classification, example from Clojure For Machine Learning - clojure

I am currently learning this algorithm for Bayesian Classification and when i was trying to follow along an example in the book i got these weird results which wasn't concise with the examples in the book.
I don't think my code is wrong (because i basically copied it by hand) but i still get results in the REPL which is impossible such as:
> (+ (evidence-of-sea-bass) (evidence-of-salmon))
==> 2.8139728009700775
It should return 1.000... with a small floating point precision error.
Here is the code:
(defn make-sea-bass []
#{:sea-bass
(if (< (rand) 0.2) :fat :thin)
(if (< (rand) 0.7) :long :short)
(if (< (rand) 0.8) :light :dark)})
(defn make-salmon []
#{:salmon
(if (< (rand) 0.8) :fat :thin)
(if (< (rand) 0.5) :long :short)
(if (< (rand) 0.3) :light :dark)})
(defn make-sample-fish []
(if (< (rand) 0.3) (make-sea-bass) (make-salmon)))
(def fish-training-data
(for [i (range 10000)] (make-sample-fish)))
(defn probability
[attribute & {:keys
[category prior-positive prior-negative data]
:or {category nil
data fish-training-data}}]
(let [by-category (if category
(filter category data)
data)
positive (count (filter attribute by-category))
negative (- (count by-category) positive)
total (+ positive negative)]
(/ positive negative)))
(defn evidence-of-salmon [& attrs]
(let [attr-prob (map #(probability % :category :salmon) attrs)
class-and-attr-prob (conj attr-prob (probability :salmon))]
(float (apply * class-and-attr-prob))))
(defn evidence-of-sea-bass [& attrs]
(let [attr-prob (map #(probability % :category :sea-bass) attrs)
class-and-attr-prob (conj attr-prob (probability :sea-bass))]
(float (apply * class-and-attr-prob))))

If you expect the result to be 1.0 then your probability fn result should be (/ positive total)

Related

Return an else value when using recur

I am new to Clojure, and doing my best to forget all my previous experience with more procedural languages (java, ruby, swift) and embrace Clojure for what it is. I am actually really enjoying the way it makes me think differently -- however, I have come up against a pattern that I just can't seem to figure out. The easiest way to illustrate, is with some code:
(defn char-to-int [c] (Integer/valueOf (str c)))
(defn digits-dont-decrease? [str]
(let [digits (map char-to-int (seq str)) i 0]
(when (< i 5)
(if (> (nth digits i) (nth digits (+ i 1)))
false
(recur (inc i))))))
(def result (digits-dont-decrease? "112233"))
(if (= true result)
(println "fit rules")
(println "doesn't fit rules"))
The input is a 6 digit number as a string, and I am simply attempting to make sure that each digit from left to right is >= the previous digit. I want to return false if it doesn't, and true if it does. The false situation works great -- however, given that recur needs to be the last thing in the function (as far as I can tell), how do I return true. As it is, when the condition is satisfied, I get an illegal argument exception:
Execution error (IllegalArgumentException) at clojure.exercise.two/digits-dont-decrease? (four:20).
Don't know how to create ISeq from: java.lang.Long
How should I be thinking about this? I assume my past training is getting in my mental way.
This is not answering your question, but also shows an alternative. While the (apply < ...) approach over the whole string is very elegant for small strings (it is eager), you can use every? for an short-circuiting approach. E.g.:
user=> (defn nr-seq [s] (map #(Integer/parseInt (str %)) s))
#'user/nr-seq
user=> (every? (partial apply <=) (partition 2 1 (nr-seq "123")))
true
You need nothing but
(apply <= "112233")
Reason: string is a sequence of character and comparison operator works on character.
(->> "0123456789" (mapcat #(repeat 1000 %)) (apply str) (def loooong))
(count loooong)
10000
(time (apply <= loooong))
"Elapsed time: 21.006625 msecs"
true
(->> "9123456789" (mapcat #(repeat 1000 %)) (apply str) (def bad-loooong))
(count bad-loooong)
10000
(time (apply <= bad-loooong))
"Elapsed time: 2.581750 msecs"
false
(above runs on my iPhone)
In this case, you don't really need loop/recur. Just use the built-in nature of <= like so:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(def true-samples
["123"
"112233"
"13"])
(def false-samples
["10"
"12324"])
(defn char->int
[char-or-str]
(let [str-val (str char-or-str)] ; coerce any chars to len-1 strings
(assert (= 1 (count str-val)))
(Integer/parseInt str-val)))
(dotest
(is= 5 (char->int "5"))
(is= 5 (char->int \5))
(is= [1 2 3] (mapv char->int "123"))
; this shows what we are going for
(is (<= 1 1 2 2 3 3))
(isnt (<= 1 1 2 1 3 3))
and now test the char sequences:
;-----------------------------------------------------------------------------
; using built-in `<=` function
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (apply <= digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (apply <= digit-vals))))
if you want to write your own, you can like so:
(defn increasing-equal-seq?
"Returns true iff sequence is non-decreasing"
[coll]
(when (< (count coll) 2)
(throw (ex-info "coll must have at least 2 vals" {:coll coll})))
(loop [prev (first coll)
remaining (rest coll)]
(if (empty? remaining)
true
(let [curr (first remaining)
prev-next curr
remaining-next (rest remaining)]
(if (<= prev curr)
(recur prev-next remaining-next)
false)))))
;-----------------------------------------------------------------------------
; using home-grown loop/recur
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (increasing-equal-seq? digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (increasing-equal-seq? digit-vals))))
)
with result
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
Ran 2 tests containing 15 assertions.
0 failures, 0 errors.
Passed all tests
Finished at 23:36:17.096 (run time: 0.028s)
You an use loop with recur.
Assuming you require following input v/s output -
"543221" => false
"54321" => false
"12345" => true
"123345" => true
Following function can help
;; Assuming char-to-int is defined by you before as per the question
(defn digits-dont-decrease?
[strng]
(let [digits (map char-to-int (seq strng))]
(loop [;;the bindings in loop act as initial state
decreases true
i (- (count digits) 2)]
(let [decreases (and decreases (>= (nth digits (+ i 1)) (nth digits i)))]
(if (or (< i 1) (not decreases))
decreases
(recur decreases (dec i)))))))
This should work for numeric string of any length.
Hope this helps. Please let me know if you were looking for something else :).
(defn non-decreasing? [str]
(every?
identity
(map
(fn [a b]
(<= (int a) (int b)))
(seq str)
(rest str))))
(defn non-decreasing-loop? [str]
(loop [a (seq str) b (rest str)]
(if-not (seq b)
true
(if (<= (int (first a)) (int (first b)))
(recur (rest a) (rest b))
false))))
(non-decreasing? "112334589")
(non-decreasing? "112324589")
(non-decreasing-loop? "112334589")
(non-decreasing-loop? "112324589")

Genetic programming Clojure

I've pasted the code on this page in an IDE and it works. The problem is that when I replace the definition of target-data with this vector of pairs* it gives me this error**.
(vector [[1 2]
[2 3]
[3 4]
[4 5]] ) ; *
UnsupportedOperationException nth not supported on this type: core$vector clojure.lang.RT.nthFrom (RT.java:857) **
What should I do to use my own target-data?
UPDATED FULL CODE:
(ns evolvefn.core)
;(def target-data
; (map #(vector % (+ (* % %) % 1))
; (range -1.0 1.0 0.1)))
;; We'll use input (x) values ranging from -1.0 to 1.0 in increments
;; of 0.1, and we'll generate the target [x y] pairs algorithmically.
;; If you want to evolve a function to fit your own data then you could
;; just paste a vector of pairs into the definition of target-data instead.
(def target-data
(vec[1 2]
[2 3]
[3 4]
[4 5]))
;; An individual will be an expression made of functions +, -, *, and
;; pd (protected division), along with terminals x and randomly chosen
;; constants between -5.0 and 5.0. Note that for this problem the
;; presence of the constants actually makes it much harder, but that
;; may not be the case for other problems.
(defn random-function
[]
(rand-nth '(+ - * pd)))
(defn random-terminal
[]
(rand-nth (list 'x (- (rand 10) 5))))
(defn random-code
[depth]
(if (or (zero? depth)
(zero? (rand-int 2)))
(random-terminal)
(list (random-function)
(random-code (dec depth))
(random-code (dec depth)))))
;; And we have to define pd (protected division):
(defn pd
"Protected division; returns 0 if the denominator is zero."
[num denom]
(if (zero? denom)
0
(/ num denom)))
;; We can now evaluate the error of an individual by creating a function
;; built around the individual, calling it on all of the x values, and
;; adding up all of the differences between the results and the
;; corresponding y values.
(defn error
[individual]
(let [value-function (eval (list 'fn '[x] individual))]
(reduce + (map (fn [[x y]]
(Math/abs
(- (value-function x) y)))
target-data))))
;; We can now generate and evaluate random small programs, as with:
;; (let [i (random-code 3)] (println (error i) "from individual" i))
;; To help write mutation and crossover functions we'll write a utility
;; function that injects something into an expression and another that
;; extracts something from an expression.
(defn codesize [c]
(if (seq? c)
(count (flatten c))
1))
(defn inject
"Returns a copy of individual i with new inserted randomly somwhere within it (replacing something else)."
[new i]
(if (seq? i)
(if (zero? (rand-int (count (flatten i))))
new
(if (< (rand)
(/ (codesize (nth i 1))
(- (codesize i) 1)))
(list (nth i 0) (inject new (nth i 1)) (nth i 2))
(list (nth i 0) (nth i 1) (inject new (nth i 2)))))
new))
(defn extract
"Returns a random subexpression of individual i."
[i]
(if (seq? i)
(if (zero? (rand-int (count (flatten i))))
i
(if (< (rand) (/ (codesize (nth i 1))
(- (codesize i)) 1))
(extract (nth i 1))
(extract (nth i 2))))
i))
;; Now the mutate and crossover functions are easy to write:
(defn mutate
[i]
(inject (random-code 2) i))
(defn crossover
[i j]
(inject (extract j) i))
;; We can see some mutations with:
;; (let [i (random-code 2)] (println (mutate i) "from individual" i))
;; and crossovers with:
;; (let [i (random-code 2) j (random-code 2)]
;; (println (crossover i j) "from" i "and" j))
;; We'll also want a way to sort a populaty by error that doesn't require
;; lots of error re-computation:
(defn sort-by-error
[population]
(vec (map second
(sort (fn [[err1 ind1] [err2 ind2]] (< err1 err2))
(map #(vector (error %) %) population)))))
;; Finally, we'll define a function to select an individual from a sorted
;; population using tournaments of a given size.
(defn select
[population tournament-size]
(let [size (count population)]
(nth population
(apply min (repeatedly tournament-size #(rand-int size))))))
;; Now we can evolve a solution by starting with a random population and
;; repeatedly sorting, checking for a solution, and producing a new
;; population.
(defn evolve
[popsize]
(println "Starting evolution...")
(loop [generation 0
population (sort-by-error (repeatedly popsize #(random-code 2)))]
(let [best (first population)
best-error (error best)]
(println "======================")
(println "Generation:" generation)
(println "Best error:" best-error)
(println "Best program:" best)
(println " Median error:" (error (nth population
(int (/ popsize 2)))))
(println " Average program size:"
(float (/ (reduce + (map count (map flatten population)))
(count population))))
(if (< best-error 0.1) ;; good enough to count as success
(println "Success:" best)
(recur
(inc generation)
(sort-by-error
(concat
(repeatedly (* 1/2 popsize) #(mutate (select population 7)))
(repeatedly (* 1/4 popsize) #(crossover (select population 7)
(select population 7)))
(repeatedly (* 1/4 popsize) #(select population 7)))))))))
;; Run it with a population of 1000:
(evolve 1000)
And the error is:
(evolve 1000)
Starting evolution...
IllegalArgumentException No matching method found: abs clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:80)
evolvefn.core=>

Idiomatic Creation of Hash-Map

I'd like to create a hash-map that has n number of key-value pairs created in sets of 3 where the sets do not intersect, e.g. [(34 false) (35 false) (36 false)] && [(24 false) (25 false) (26 false)] -> {34 false 35 false 36 false 24 false 25 false 26 false}
EDIT:
To play/practice with Clojure, I'm attempting to implement an idiomatic version of the battleship board game. I decided to store the battleship coordinates in a hash-map where the keys are coordinates and the values are booleans indicating whether that section of the ship has been hit. The specific piece of code below is supposed to
Select an axis (horizontal or vertical)
Select a coordinate for the bow of the ship
"Build" the rest of the ship (3 coordinates in total) by increasing the x or y value accordingly, e.g. {"10" false "11" false "12" false}. Note the "10" translates into the second row of a matrix, first column.
Note: Before adding the ship to the hash-map of coordinates the new ship coordinates must be checked to ensure that an intersection does not exist. If it does, the ship must be "re-built."
To that end, I've created the code below. It has 2 issues:
Executing the function results in the following exception from the use of the 'acc' accumulator:
clojure.lang.LazySeq cannot be cast to clojure.lang.Associative
The result of the function is not a single hash-map, but rather a list of n hash-maps
Using idiomatic clojure, how can I achieve my goal?
(defn launch
[n]
(loop [cnt n acc {}]
(if (= cnt 0)
acc
(recur
(- cnt 1)
((fn []
(let [axis (rand-int 2)]
(if (= axis 0)
(let [x (rand-int 8) y (rand-int 10)]
(for [k (range 3)]
(assoc acc (str y (+ x k)) false)))
(let [x (rand-int 10) y (rand-int 8)]
(for [k (range 3)]
(assoc acc (str (+ y k) x) false)))))))))))
that's how i would rewrite it:
(defn create-key [axis-val i]
(if axis-val
(str (rand-int 10) (+ (rand-int 8) i))
(str (+ (rand-int 8) i) (rand-int 10))))
(defn launch [n]
(reduce (fn [acc axis]
(reduce #(assoc % (create-key axis %2) false)
acc
(range 3)))
{}
(repeatedly n #(zero? (rand-int 2)))))
in repl:
user> (launch 5)
{"40" false, "07" false, "19" false,
"46" false, "87" false, "47" false,
"41" false, "62" false, "86" false}
or (in case you don't like reduce):
(defn launch [n]
(zipmap (mapcat #(map (partial create-key %) (range 3))
(repeatedly n #(zero? (rand-int 2))))
(repeat false)))
the third variant is to use list comprehension to generate keys:
(defn launch [n]
(zipmap (for [_ (range n)
:let [axis (zero? (rand-int 2))]
i (range 3)]
(create-key axis i))
(repeat false)))
all three of them are idiomatic ones, i guess, so it's up to you to choose one, according to your own preferred programming style.
notice that the resulting keys are shuffled inside the map, because unsorted maps don't preserve order. If it is important, you should use sorted-map
What about your variant, the one generating error is this:
(for [k (range 3)] (assoc acc (str y (+ x k)) false))
it doesn't put all the keys to one map, rather it generates a seq of three items equalling (assoc acc k false):
(let [acc {}]
(for [k (range 3)] (assoc acc k false)))
;;=> ({0 false} {1 false} {2 false})
to do what you want, you use reduce:
(let [acc {}]
(reduce #(assoc %1 %2 false) acc (range 3)))
;;=> {0 false, 1 false, 2 false}
leetwinski has given a more concise answer, but I thought I would post this anyway, since I basically left your structure intact, and this may help you see the error a bit more clearly.
First, I am not sure why you were rebinding acc to the value of an anonymous function call. Your let will happily return a result; so, you should probably do some thinking about why you thought it was necessary to create an anonymous function.
Second, the problem is that for returns a lazy seq, and you are binding this to what you think is a map data structure. This explains why it works fine for cases 0 and 1, but when you use a value of 2 it fails.
Since I don't really fully understand what you're trying to accomplish, here is your original code, modified to work. Disclaimer--this is not really idiomatic and not how I would write it, but I'm posting because it may be helpful to see versus the original, since it actually works.
(defn launch
[n]
(loop [cnt n
acc {}]
(if (= cnt 0)
acc
(recur
(dec cnt)
(into acc
(let [axis (rand-int 2)]
(if (= axis 0)
(let [x (rand-int 8) y (rand-int 10)]
(map #(hash-map (str y (+ x %)) false) (range 3)))
(let [x (rand-int 10) y (rand-int 8)]
(map #(hash-map (str (+ y %) x) false) (range 3))))))))))

clojure performance on badly performing code

I have completed this problem on hackerrank and my solution passes most test cases but it is not fast enough for 4 out of the 11 test cases.
My solution looks like this:
(ns scratch.core
(require [clojure.string :as str :only (split-lines join split)]))
(defn ascii [char]
(int (.charAt (str char) 0)))
(defn process [text]
(let [parts (split-at (int (Math/floor (/ (count text) 2))) text)
left (first parts)
right (if (> (count (last parts)) (count (first parts)))
(rest (last parts))
(last parts))]
(reduce (fn [acc i]
(let [a (ascii (nth left i))
b (ascii (nth (reverse right) i))]
(if (> a b)
(+ acc (- a b))
(+ acc (- b a))))
) 0 (range (count left)))))
(defn print-result [[x & xs]]
(prn x)
(if (seq xs)
(recur xs)))
(let [input (slurp "/Users/paulcowan/Downloads/input10.txt")
inputs (str/split-lines input)
length (read-string (first inputs))
texts (rest inputs)]
(time (print-result (map process texts))))
Can anyone give me any advice about what I should look at to make this faster?
Would using recursion instead of reduce be faster or maybe this line is expensive:
right (if (> (count (last parts)) (count (first parts)))
(rest (last parts))
(last parts))
Because I am getting a count twice.
You are redundantly calling reverse on every iteration of the reduce:
user=> (let [c [1 2 3]
noisey-reverse #(doto (reverse %) println)]
(reduce (fn [acc e] (conj acc (noisey-reverse c) e))
[]
[:a :b :c]))
(3 2 1)
(3 2 1)
(3 2 1)
[(3 2 1) :a (3 2 1) :b (3 2 1) :c]
The reversed value could be calculated inside the containing let, and would then only need to be calculated once.
Also, due to the way your parts is defined, you are doing linear time lookups with each call to nth. It would be better to put parts in a vector and do indexed lookup. In fact you wouldn't need a reversed parts, and could do arithmetic based on the count of the vector to find the item to look up.

Project Euler #14 and memoization in Clojure

As a neophyte clojurian, it was recommended to me that I go through the Project Euler problems as a way to learn the language. Its definitely a great way to improve your skills and gain confidence. I just finished up my answer to problem #14. It works fine, but to get it running efficiently I had to implement some memoization. I couldn't use the prepackaged memoize function because of the way my code was structured, and I think it was a good experience to roll my own anyways. My question is if there is a good way to encapsulate my cache within the function itself, or if I have to define an external cache like I have done. Also, any tips to make my code more idiomatic would be appreciated.
(use 'clojure.test)
(def mem (atom {}))
(with-test
(defn chain-length
([x] (chain-length x x 0))
([start-val x c]
(if-let [e (last(find #mem x))]
(let [ret (+ c e)]
(swap! mem assoc start-val ret)
ret)
(if (<= x 1)
(let [ret (+ c 1)]
(swap! mem assoc start-val ret)
ret)
(if (even? x)
(recur start-val (/ x 2) (+ c 1))
(recur start-val (+ 1 (* x 3)) (+ c 1)))))))
(is (= 10 (chain-length 13))))
(with-test
(defn longest-chain
([] (longest-chain 2 0 0))
([c max start-num]
(if (>= c 1000000)
start-num
(let [l (chain-length c)]
(if (> l max)
(recur (+ 1 c) l c)
(recur (+ 1 c) max start-num))))))
(is (= 837799 (longest-chain))))
Since you want the cache to be shared between all invocations of chain-length, you would write chain-length as (let [mem (atom {})] (defn chain-length ...)) so that it would only be visible to chain-length.
In this case, since the longest chain is sufficiently small, you could define chain-length using the naive recursive method and use Clojure's builtin memoize function on that.
Here's an idiomatic(?) version using plain old memoize.
(def chain-length
(memoize
(fn [n]
(cond
(== n 1) 1
(even? n) (inc (chain-length (/ n 2)))
:else (inc (chain-length (inc (* 3 n))))))))
(defn longest-chain [start end]
(reduce (fn [x y]
(if (> (second x) (second y)) x y))
(for [n (range start (inc end))]
[n (chain-length n)])))
If you have an urge to use recur, consider map or reduce first. They often do what you want, and sometimes do it better/faster, since they take advantage of chunked seqs.
(inc x) is like (+ 1 x), but inc is about twice as fast.
You can capture the surrounding environment in a clojure :
(defn my-memoize [f]
(let [cache (atom {})]
(fn [x]
(let [cy (get #cache x)]
(if (nil? cy)
(let [fx (f x)]
(reset! cache (assoc #cache x fx)) fx) cy)))))
(defn mul2 [x] (do (print "Hello") (* 2 x)))
(def mmul2 (my-memoize mul2))
user=> (mmul2 2)
Hello4
user=> (mmul2 2)
4
You see the mul2 funciton is only called once.
So the 'cache' is captured by the clojure and can be used to store the values.