Value assignment and debugging in Clojure - clojure

I am trying to add a section to a Clojure code. After line no. 99 of the below code:
https://github.com/lspector/Clojush/blob/master/src/clojush/pushgp/breed.clj
I want to add these codes:
(if (= num-parents 2)
(let [initial-other-parents (vec (repeatedly
(+ num-parents 4) ; selecting parents more than required by 4
(fn []
(loop [re-selections 0
other (select population argmap)]
(if (and (= other first-parent)
(< re-selections
(:self-mate-avoidance-limit argmap)))
(recur (inc re-selections)
(select population argmap))
other)))))
all-parents (concat (if (nil? first-parent) ;gathering all created parents
nil
(vector first-parent))
initial-other-parents)
(defn eclid-dist [u v] ;defining a function to calculate distances
(->> (mapv - u v)
(mapv #(Math/pow % 2))
(reduce +)
Math/sqrt))
(defn find-largest-dist-pair [vec-map] ;defining a function to return two vectors (parents) with the largest distance
(apply max-key second
(for [[[k0 v0] & r] (iterate rest vec-map)
:while r
[k1 v1] r]
[[k0 k1] (eclid-dist v0 v1)])))
final-parents (find-largest-dist-pair (:error all-parents)) ;selecting two parents with the largest distance
op-fn (:fn (get genetic-operators operator)) ; extracting the operator
child (apply op-fn (concat final-parents ; creating child
(vector (assoc argmap
:population population))))]
)
For running the added part line by line, I extracted values before the changes using adding the below code after line 84 of the main code:
(spit "initial-setting.edn" {:operator-list operator-list
:first-parent first-parent
:population population
:location location
:rand-gen rand-gen
:argmap argmap})
My question is:
what is the best way to assign extracted values by "spit" to variables and execute the added section line by line for debugging?
I am using Calva as IDE and tried to put #break and use lein to run the code and debug it but it did not work. Here is my pervious post on that:
How to set a breakpoint in a Clojure program using Calva?

Related

Clojure: Find even numbers in a vector

I am coming from a Java background trying to learn Clojure. As the best way of learning is by actually writing some code, I took a very simple example of finding even numbers in a vector. Below is the piece of code I wrote:
`
(defn even-vector-2 [input]
(def output [])
(loop [x input]
(if (not= (count x) 0)
(do
(if (= (mod (first x) 2) 0)
(do
(def output (conj output (first x)))))
(recur (rest x)))))
output)
`
This code works, but it is lame that I had to use a global symbol to make it work. The reason I had to use the global symbol is because I wanted to change the state of the symbol every time I find an even number in the vector. let doesn't allow me to change the value of the symbol. Is there a way this can be achieved without using global symbols / atoms.
The idiomatic solution is straightfoward:
(filter even? [1 2 3])
; -> (2)
For your educational purposes an implementation with loop/recur
(defn filter-even [v]
(loop [r []
[x & xs :as v] v]
(if (seq v) ;; if current v is not empty
(if (even? x)
(recur (conj r x) xs) ;; bind r to r with x, bind v to rest
(recur r xs)) ;; leave r as is
r))) ;; terminate by not calling recur, return r
The main problem with your code is you're polluting the namespace by using def. You should never really use def inside a function. If you absolutely need mutability, use an atom or similar object.
Now, for your question. If you want to do this the "hard way", just make output a part of the loop:
(defn even-vector-3 [input]
(loop [[n & rest-input] input ; Deconstruct the head from the tail
output []] ; Output is just looped with the input
(if n ; n will be nil if the list is empty
(recur rest-input
(if (= (mod n 2) 0)
(conj output n)
output)) ; Adding nothing since the number is odd
output)))
Rarely is explicit looping necessary though. This is a typical case for a fold: you want to accumulate a list that's a variable-length version of another list. This is a quick version:
(defn even-vector-4 [input]
(reduce ; Reducing the input into another list
(fn [acc n]
(if (= (rem n 2) 0)
(conj acc n)
acc))
[] ; This is the initial accumulator.
input))
Really though, you're just filtering a list. Just use the core's filter:
(filter #(= (rem % 2) 0) [1 2 3 4])
Note, filter is lazy.
Try
#(filterv even? %)
if you want to return a vector or
#(filter even? %)
if you want a lazy sequence.
If you want to combine this with more transformations, you might want to go for a transducer:
(filter even?)
If you wanted to write it using loop/recur, I'd do it like this:
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(loop [result []
unused input]
(if (empty? unused)
result
(let [curr-value (first unused)
next-result (if (is-even? curr-value)
(conj result curr-value)
result)
next-unused (rest unused) ]
(recur next-result next-unused)))))
This gets the same result as the built-in filter function.
Take a look at filter, even? and vec
check out http://cljs.info/cheatsheet/
(defn even-vector-2 [input](vec(filter even? input)))
If you want a lazy solution, filter is your friend.
Here is a non-lazy simple solution (loop/recur can be avoided if you apply always the same function without precise work) :
(defn keep-even-numbers
[coll]
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj agg nb) agg))
[] coll))
If you like mutability for "fun", here is a solution with temporary mutable collection :
(defn mkeep-even-numbers
[coll]
(persistent!
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj! agg nb) agg))
(transient []) coll)))
...which is slightly faster !
mod would be better than rem if you extend the odd/even definition to negative integers
You can also replace [] by the collection you want, here a vector !
In Clojure, you generally don't need to write a low-level loop with loop/recur. Here is a quick demo.
(ns tst.clj.core
(:require
[tupelo.core :as t] ))
(t/refer-tupelo)
(defn is-even?
"Returns true if x is even, otherwise false."
[x]
(zero? (mod x 2)))
; quick sanity checks
(spyx (is-even? 2))
(spyx (is-even? 3))
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(into [] ; forces result into vector, eagerly
(filter is-even? input)))
; demonstrate on [0 1 2...9]
(spyx (keep-even (range 10)))
with result:
(is-even? 2) => true
(is-even? 3) => false
(keep-even (range 10)) => [0 2 4 6 8]
Your project.clj needs the following for spyx to work:
:dependencies [
[tupelo "0.9.11"]

Need your help on running clojure library via leiningen

I found a solution for minimum hitting set on github: https://github.com/bdesham/hitting-set and then tried to use it. The solution is clojure library so I downloaded leiningen to try to run it.
I read the readme file from github link but I still didn't know how to run the clj code to get result of minimal hitting set. I saw that there was a function called minimal-hitting-sets in hitting_set.clj file but I don't know how to call it with argument.
Eg: Get minimal hitting set of:
{"Australia" #{:white :red :blue},
"Tanzania" #{:black :blue :green :yellow},
"Norway" #{:white :red :blue},
"Uruguay" #{:white :blue :yellow},
"Saint Vincent and the Grenadines" #{:blue :green :yellow},
"Ivory Coast" #{:white :orange :green},
"Sierra Leone" #{:white :blue :green},
"United States" #{:white :red :blue}}
Project.clj code:
(defproject hitting-set "0.9.0"
:description "Find minimal hitting sets"
:url "https://github.com/bdesham/hitting-set"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"
:distribution :repo
:comments "Same as Clojure"}
:main hitting-set
:min-lein-version "2.0.0"
:dependencies [ [org.clojure/clojure "1.4.0"]
[hitting-set "0.9.0"]])
hitting_set.clj code:
(ns hitting-set
(:use hitting-set :only [minimal-hitting-sets]))
; Utility functions
(defn- dissoc-elements-containing
"Given a map in which the keys are sets, removes all keys whose sets contain
the element el. Adapted from http://stackoverflow.com/a/2753997/371228"
[el m]
(apply dissoc m (keep #(-> % val
(not-any? #{el})
(if nil (key %)))
m)))
(defn- map-old-new
"Returns a sequence of vectors. Each first item is an element of coll and the
second item is the result of calling f with that item."
[f coll]
(map #(vector % (f %)) coll))
(defn- count-vertices
"Returns the number of vertices in the hypergraph h."
[h]
(count (apply union (vals h))))
(defn- sorted-hypergraph
"Returns a version of the hypergraph h that is sorted so that the edges with
the fewest vertices come first."
[h]
(into (sorted-map-by (fn [key1 key2]
(compare [(count (get h key1)) key1]
[(count (get h key2)) key2])))
h))
(defn- remove-dupes
"Given a map m, remove all but one of the keys that map to any given value."
[m]
(loop [sm (sorted-map),
m m,
seen #{}]
(if-let [head (first m)]
(if (contains? seen (second head))
(recur sm
(rest m)
seen)
(recur (assoc sm (first head) (second head))
(rest m)
(conj seen (second head))))
sm)))
(defn- efficient-hypergraph
"Given a hypergraph h, returns an equivalent hypergraph that will go through
the hitting set algorithm more quickly. Specifically, redundant edges are
discarded and then the map is sorted so that the smallest edges come first."
[h]
(-> h remove-dupes sorted-hypergraph))
(defn- largest-edge
"Returns the name of the edge of h that has the greatest number of vertices."
[h]
(first (last (sorted-hypergraph h))))
(defn- remove-vertices
"Given a hypergraph h and a set vv of vertices, remove the vertices from h
(i.e. remove all of the vertices of vv from each edge in h). If this would
result in an edge becoming empty, remove that edge entirely."
[h vv]
(loop [h h,
res {}]
(if (first h)
(let [edge (difference (second (first h))
vv)]
(if (< 0 (count edge))
(recur (rest h)
(assoc res (first (first h)) edge))
(recur (rest h)
res)))
res)))
; Auxiliary functions
;
; These functions might be useful if you're working with hitting sets, although
; they're not actually invoked anywhere else in this project.
(defn reverse-map
"Takes a map from keys to sets of values. Produces a map in which the values
are mapped to the set of keys in whose sets they originally appeared."
[m]
(apply merge-with into
(for [[k vs] m]
(apply hash-map (flatten (for [v vs]
[v #{k}]))))))
(defn drop-elements
"Given a set of N elements, return a set of N sets, each of which is the
result of removing a different item from the original set."
[s]
(set (for [e s] (difference s #{e}))))
; The main functions
;
; These are the functions that users are probably going to be interested in.
; Hitting set
(defn hitting-set?
"Returns true if t is a hitting set of h. Does not check whether s is
minimal."
[h t]
(not-any? empty? (map #(intersection % t)
(vals h))))
(defn hitting-set-exists?
"Returns true if a hitting set of size k exists for the hypergraph h. See the
caveat in README.md for odd behavior of this function."
[h k]
(cond
(< (count-vertices h) k) false
(empty? h) true
(zero? k) false
:else (let [hvs (map #(dissoc-elements-containing % h)
(first (vals h)))]
(boolean (some #(hitting-set-exists? % (dec k))
hvs)))))
(defn- enumerate-algorithm
[h k x]
(cond
(empty? h) #{x}
(zero? k) #{}
:else (let [hvs (map-old-new #(dissoc-elements-containing % h)
(first (vals h)))]
(apply union (map #(enumerate-algorithm (second %)
(dec k)
(union x #{(first %)}))
hvs)))))
(defn enumerate-hitting-sets
"Return a set containing the hitting sets of h. See the caveat in README.md
for odd behavior of this function. If the parameter k is passed then the
function will return all hitting sets of size less than or equal to k."
([h]
(enumerate-algorithm (efficient-hypergraph h) (count-vertices h) #{}))
([h k]
(enumerate-algorithm (efficient-hypergraph h) k #{})))
(defn minimal-hitting-sets
"Returns a set containing the minimal hitting sets of the hypergraph h. If
you just want one hitting set and don't care whether there are multiple
minimal hitting sets, use (first (minimal-hitting-sets h))."
[h]
(first (filter #(> (count %) 0)
(map #(enumerate-hitting-sets h %)
(range 1 (inc (count-vertices h)))))))
; Set cover
(defn cover?
"Returns true if the elements of s form a set cover for the hypergraph h."
[h s]
(= (apply union (vals h))
(apply union (map #(get h %) s))))
(defn greedy-cover
"Returns a set cover of h using the 'greedy' algorithm."
[h]
(loop [hh h,
edges #{}]
(if (cover? h edges)
edges
(let [e (largest-edge hh)]
(recur (remove-vertices hh (get hh e))
(conj edges e))))))
(defn approx-hitting-set
"Returns a hitting set of h. The set is guaranteed to be a hitting set, but
may not be minimal."
[h]
(greedy-cover (reverse-map h)))
Since I am a new bie to leiningen and clojure so I really need your help on it.
Thanks,
Hung
In general to use a clojure library from clojure:
make a new project with lein new app project-name
include the library in project.clj's dependency section
require and refer to that library in at lease one .clj file (core.clj is an example)
load that file in you editor of choice and switch the REPL namespace to the namespace in ns form at the top of the file.
...
profit!!
There are a lot more details though I hope this is enough to give you an overview of one way to go about this, and if you solve step 5 please share your solution ;-)

Simple "R-like" melt : better way to do?

Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))

Efficiently create and diff sets created from large text file

I am attempting to copy about 12 million documents in an AWS S3 bucket to give them new names. The names previously had a prefix and will now all be document name only. So a/b/123 once renamed will be 123. The last segment is a uuid so there will not be any naming collisions.
This process has been partially completed so some have been copied and some still need to be. I have a text file that contains all of the document names. I would like an efficient way to determine which documents have not yet been moved.
I have some naive code that shows what I would like to accomplish.
(def doc-names ["o/123" "o/234" "t/543" "t/678" "123" "234" "678"])
(defn still-need-copied [doc-names]
(let [last-segment (fn [doc-name]
(last (clojure.string/split doc-name #"/")))
by-position (group-by #(.contains % "/") doc-names)
top (set (get by-position false))
nested (set (map #(last-segment %) (get by-position true)))
needs-copied (clojure.set/difference nested top)]
(filter #(contains? needs-copied (last-segment %)) doc-names)))
I would propose this solution:
(defn still-need-copied [doc-names]
(->> doc-names
(group-by #(last (clojure.string/split % #"/")))
(keep #(when (== 1 (count (val %))) (first (val %))))))
first you group all the items by the last element split string, getting this for your input:
{"123" ["o/123" "123"],
"234" ["o/234" "234"],
"543" ["t/543"],
"678" ["t/678" "678"]}
and then you just need to select all the values of a map, having length of 1, and to take their first elements.
I would say it is way more readable than your variant, and also seems to be more productive.
That's why:
as far as I can understand, your code here probably has a complexity of
N (grouping to a map with just 2 keys) +
Nlog(N) (creation and filling of top set) +
Nlog(N) (creation and filling of nested set) +
Nlog(N) (sets difference) +
Nlog(N) (filtering + searching each element in a needs-copied set) =
4Nlog(N) + N
whereas my variant would probably have the complexity of
Nlog(N) (grouping values into a map with a large amount of keys) +
N (keeping needed values) =
N + Nlog(N)
And though asymptotically they are both O(Nlog(N)), practically mine will probably complete faster.
ps: Not an expert in the complexity theory. Just made some very rough estimation
here is a little test:
(defn generate-data [len]
(doall (mapcat
#(let [n (rand-int 2)]
(if (zero? n)
[(str "aaa/" %) (str %)]
[(str %)]))
(range len))))
(defn still-need-copied [doc-names]
(let [last-segment (fn [doc-name]
(last (clojure.string/split doc-name #"/")))
by-position (group-by #(.contains % "/") doc-names)
top (set (get by-position false))
nested (set (map #(last-segment %) (get by-position true)))
needs-copied (clojure.set/difference nested top)]
(filter #(contains? needs-copied (last-segment %)) doc-names)))
(defn still-need-copied-2 [doc-names]
(->> doc-names
(group-by #(last (clojure.string/split % #"/")))
(keep #(when (== 1 (count (val %))) (first (val %))))))
(def data-100k (generate-data 100000))
(def data-1m (generate-data 1000000))
user> (let [_ (time (dorun (still-need-copied data-100k)))
_ (time (dorun (still-need-copied-2 data-100k)))
_ (time (dorun (still-need-copied data-1m)))
_ (time (dorun (still-need-copied-2 data-1m)))])
"Elapsed time: 714.929641 msecs"
"Elapsed time: 243.918466 msecs"
"Elapsed time: 7094.333425 msecs"
"Elapsed time: 2329.75247 msecs"
so it is ~3 times faster, just as I predicted
update:
found one solution, which is not so elegant, but seems to be working.
You said you're using iota, so i've generated a huge file with the lines of ~15 millions of lines (with forementioned generate-data fn)
then i've decided to sort if by the last part after slash (so that "123" and "aaa/123" stand together.
(defn last-part [s] (last (clojure.string/split s #"/")))
(def sorted (sort-by last-part (iota/seq "my/file/path")))
it has completed surprisingly fast. So the last thing i had to do, is to make a simple loop checking for every item if there is an item with the same last part nearby:
(def res (loop [res [] [item1 & [item2 & rest :as tail] :as coll] sorted]
(cond (empty? coll) res
(empty? tail) (conj res item1)
(= (last-part item1) (last-part item2)) (recur res rest)
:else (recur (conj res item1) tail))))
it has also completed without any visible difficulties, so i've got the needed result without any map/reduce framework.
I think also, that if you won't keep the sorted coll in a var, you would probably save memory by avoiding the huge coll head retention:
(def res (loop [res []
[item1 & [item2 & rest :as tail] :as coll] (sort-by last-part (iota/seq "my/file/path"))]
(cond (empty? coll) res
(empty? tail) (conj res item1)
(= (last-part item1) (last-part item2)) (recur res rest)
:else (recur (conj res item1) tail))))

Proper way to obtain side effects while using the java.io/reader in Clojure?

I'm reading lines from a very large text file. The file contains a set of data that I'd like to select specific line numbers from. What I'd like to do is read in a line from the file, if the line is one that I want, conj it to my result, and if it's not, then check the next line. I don't want to store all the lines I've seen in memory so I'd like a way to drop them from the reader line-seq as I read them.
I have a function like this:
;; evaluates but doesn't modify the line sequence so continuously adds
;; the same first line to the result. I would like this exact function
;; but somehow have it drop the first line of lines at each iteration.
(defn get-training-data [batch-size batch-num]
(let [line-numbers (fn that returns vector of random numbers)]
(with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
(let [lines (line-seq rdr) res []]
(for [i (range (apply max line-numbers))
:let [res (conj res (json/read-str (first lines)))]
:when (some #{i} line-numbers)]
res)))))
I also have a function like this:
;;this works as I want it to, but only with a small file and produces a
;;stack overflow with a large file
(defn get-training-data1 [batch-size batch-num]
(let [line-numbers (fn that returns a vector of random numbers)]
(with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
(let [lines (line-seq rdr)]
(loop [i 0 f (apply max line-numbers) res [] lines lines]
(if (> i f)
res
(if (some #{i} line-numbers)
(recur
(inc i)
f
(conj res (json/read-str (first lines)))
(drop 1 lines))
(recur
(inc i)
f
res
(drop 1 lines)))))))))
As I tried to test this, I developed the following simpler cases:
;;works
(let [res []]
(for [i (range 10)
:let [res (conj res i)]
:when (odd? i)]
res)) ;;([1] [3] [5] [7] [9])
;;now an attempt to get the same result but have a side effect each time,
;;produces null pointer exception.
(let [res []]
(for [i (range 10)
:let [res (conj res i)]
:when (odd? i)]
(doall
(println i)
res)))
I believe if I could figure out how to produce a side effect within a for, then the first problem about would be resolved because I could just make the side effect to drop the first line of the reader's line sequence.
Do you guys have any thoughts?
map and filter will do this nicely and keep it lazy so you don't store any more in memory than you have to.
user> (->> (line-seq (clojure.java.io/reader "project.clj")) ;; lazy sequence of lines
(map vector (range)) ;; add an index
(filter #(#{1 3 7 9} (first %))) ;; filter by index
(map second )) ;; drop the index
(" :description \"API server for Yummly mobile app(s)\""
"[com.project/example \"1.4.8-SNAPSHOT\"]"
" [org.clojure/tools.cli \"0.2\.4\"]"
" [clojurewerkz/mailer \"1.0.0-alpha3\"]")