I have the following code:
(defrecord Stoptest [&args])
(def test (Stoptest. [:c101 :main-office :a1]))
; gets the values out
(doseq [arg (:&args test)] (print arg))
Is there a way in which I can recur around args and put the values into a lazy sequence?
Related
I am trying to add a section to a Clojure code. After line no. 99 of the below code:
https://github.com/lspector/Clojush/blob/master/src/clojush/pushgp/breed.clj
I want to add these codes:
(if (= num-parents 2)
(let [initial-other-parents (vec (repeatedly
(+ num-parents 4) ; selecting parents more than required by 4
(fn []
(loop [re-selections 0
other (select population argmap)]
(if (and (= other first-parent)
(< re-selections
(:self-mate-avoidance-limit argmap)))
(recur (inc re-selections)
(select population argmap))
other)))))
all-parents (concat (if (nil? first-parent) ;gathering all created parents
nil
(vector first-parent))
initial-other-parents)
(defn eclid-dist [u v] ;defining a function to calculate distances
(->> (mapv - u v)
(mapv #(Math/pow % 2))
(reduce +)
Math/sqrt))
(defn find-largest-dist-pair [vec-map] ;defining a function to return two vectors (parents) with the largest distance
(apply max-key second
(for [[[k0 v0] & r] (iterate rest vec-map)
:while r
[k1 v1] r]
[[k0 k1] (eclid-dist v0 v1)])))
final-parents (find-largest-dist-pair (:error all-parents)) ;selecting two parents with the largest distance
op-fn (:fn (get genetic-operators operator)) ; extracting the operator
child (apply op-fn (concat final-parents ; creating child
(vector (assoc argmap
:population population))))]
)
For running the added part line by line, I extracted values before the changes using adding the below code after line 84 of the main code:
(spit "initial-setting.edn" {:operator-list operator-list
:first-parent first-parent
:population population
:location location
:rand-gen rand-gen
:argmap argmap})
My question is:
what is the best way to assign extracted values by "spit" to variables and execute the added section line by line for debugging?
I am using Calva as IDE and tried to put #break and use lein to run the code and debug it but it did not work. Here is my pervious post on that:
How to set a breakpoint in a Clojure program using Calva?
I am trying to convert SICP's meta-circular evaluator to Clojure. In setup-environment a call to extend-environment does not compile because I get the error "Attempting to call unbound fn". Here's part of the code:
(... loads of methods for creating and managing environment list)
(def primitive-procedures
(list (list 'car first)
(list 'cdr rest)
(list 'cons conj) ;; TODO: reverse
(list 'null? nil?)
(list 'list list)
(list '+ +)
(list '- -)
(list '* *)
(list '/ /)
;; more primitives
))
(def primitive-procedure-names
#(map [first
primitive-procedures]))
(def primitive-procedure-objects
(fn [] (map (fn [p] (list 'primitive (second p)))
primitive-procedures)))
(def the-empty-environment '())
(defn extend-environment [vars vals base-env]
(if (= (count vars) (count vals))
(conj base-env (make-frame vars vals))
(if (< (count vars) (count vals))
(throw (Throwable. "Too many arguments supplied") vars vals)
(throw (Throwable. "Too few arguments supplied") vars vals))))
;; Added # in front here so it could be called (???)
(defn setup-environment []
#(let [initial-env
(extend-environment (primitive-procedure-names)
(primitive-procedure-objects)
the-empty-environment)] ;; <= that does not work
(define-variable! 'true true initial-env)
(define-variable! 'false false initial-env)
initial-env)))
;; Method for interacting with the evaluator:
(defn driver-loop []
(prompt-for-input input-prompt)
(let [input (read)]
(let [output (m-eval input the-global-environment)]
(announce-output output-prompt)
(user-print output)))
(driver-loop))
(...)
(def the-global-environment (setup-environment))
(driver-loop)
And when I evaluate the extend-environment method I get the following error:
Caused by java.lang.IllegalStateException
Attempting to call unbound fn:
#'scheme-evaluator/extend-environment
Var.java: 43 clojure.lang.Var$Unbound/throwArity
AFn.java: 40 clojure.lang.AFn/invoke
scheme-evaluator.clj: 277 scheme-evaluator/eval7808
I think I am not providing the right type of parameters or I have not created the right type of function. I tried various variations of anonymous methods and passing in parentheses or without, but I don't get it to compile.
Does anyone know what the reason is for this error and how can I fix it?
The definition of
(def primitive-procedure-names
#(map [first
primitive-procedures]))
likely does not do what you intend. As written this defines a function that takes no arguments and returns transducer (which is a function) that will, if applied to a sequence substitute the values 0 and 1 for the functions first and primitive-procedures respectively. I'll demonstrate first with functions and then with values of numbers to make what's happening more clear (hopefully):
user> (into [] (map [first 'example]) [0 1])
[#function[clojure.core/first--4339] example]
user> (into [] (map [1 2]) [0 1])
[1 2]
perhaps you wanted
(def primitive-procedure-names
(map first primitive-procedures))
And may I suggest using the defn form for defining functions and the def form for defining values unless you have a really strong reason not to.
setup-environment is a function that returns a function which will if you call that function return a function that return's the initial-environment unmodified by the calls to define-variable. In Clojure the collection types are immutable so if you want to make several changes to a collection it's necessary to chain the result of adding the first one into the imput of adding the second one, then return the result of adding the second one:
(add-second (add-first initial-value))
which can also be written like this:
(-> initial-value
add-first
add-second)
which is just a shorthand for the example above.
I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))
I have a clojure function that uses the flambo v0.60 functions api to do some analysis on a sample data set. I noticed that when I use a (get rdd 2) instead of getting the second element in the rdd collection, its getting the second character of the first element of the rdd collection. My assumption is clojure is treating each row of the rdd collection as a whole string and not a vector for me to be able to get the second element in the collection. I'm thinking of using the map-values function to convert the mapped values into a vector for which I can get the second element, I tried this:
(defn split-on-tab-transformation [xctx input]
(assoc xctx :rdd (-> (:rdd xctx)
(spark/map (spark/fn [row] (s/split row #"\t")))
(spark/map-values vec))))
Unfortunately I got an error:
java.lang.IllegalArgumentException: No matching method found: mapValues for class org.apache.spark.api.java.JavaRDD...
This is code returns the first collection in the rdd:
(assuming I removed the (spark/map-values vec) in the above function
(defn get-distinct-column-val
"input = {:col val}"
[ xctx input ]
(let [rdds (-> (:rdd xctx)
(f/map (f/fn [row] row))
f/first)]
(clojure.pprint/pprint rdds)))
Output:
[2.00000 770127 200939.000000 \t6094\tBENTONVILLE, AR DPS\t22.500000\t5.000000\t2.500000\t5.000000\t0.000000\t0.000000\t0.000000\t0.000000\t0.000000\t1\tStore Tab\t0.000000\t4.50\t3.83\t5.00\t0.000000\t0.000000\t0.000000\t0.000000\t19.150000]
if I try to get the second element 770127
(defn get-distinct-column-val
"input = {:col val}"
[ xctx input ]
(let [rdds (-> (:rdd xctx)
(f/map (f/fn [row] row))
f/first)]
(clojure.pprint/pprint (get rdds 1)))
I get :
[\.]
Flambo documentation for map-values
I'm new to clojure and I'd appreciate any help. Thanks
First of all map-values (or mapValues in Spark API) is a valid transformation only on a PairRDD (for example something like this [:foo [1 2 3]]. RDDs with values like this can be interpreted as some some sort of maps where the first element is a key and the second is a value.
If you have RDD like this mapValues transforms the values without changing the key. In this case you should use a second map, although it seem obsolete since clojure.string/split already returns a vector.
A simple example of using map-values:
(let [pairs [(ft/tuple :foo 1) (ft/tuple :bar 2)]
rdd (f/parallelize-pairs sc pairs) ;; Note parallelize-pairs -> PairRDD
result (-> rdd
(f/map-values inc) ;; Map values
(f/collect))]
(assert (= result [(ft/tuple :foo 2) (ft/tuple :bar 3)])))
From your description it looks like you're using an input RDD instead of the one returned from split-on-tab-transformation. If I had to guess you're trying to use original xctx, not the one returned from split-on-tab-transformation. Since Clojure maps are immutable assoc doesn't change a passed argument and get-distinct-column-val receives RDD[String] not RDD[Array[String]]
Based on a naming convention I assume you want to get distinct values for a single position in a array. I removed unused parts of your code for clarity. First lets create dummy data:
(spit "data.txt"
(str "Mazda RX4\t21\t6\t160\n"
"Mazda RX4 Wag\t21\t6\t160\n"
"Datsun 710\t22.8\t4\t108\n"))
add rewritten versions of your functions
(defn split-on-tab-transformation [xctx]
(assoc xctx :rdd (-> (:rdd xctx)
(f/map #(clojure.string/split % #"\t")))))
(defn get-distinct-column-val
[xctx col]
(-> (:rdd xctx)
(f/map #(get % col))
(f/distinct)))
and result
(assert
(= #{"Mazda RX4 Wag" "Datsun 710" "Mazda RX4"}
(-> {:sc sc :rdd (f/text-file sc "data.txt")}
(split-on-tab-transformation)
(get-distinct-column-val 0)
(f/collect)
(set))))
I'm trying to build an XML structure using the internal data types from BaseX from Clojure.
(defn basex-elem [token-name dict]
(let [elem (org.basex.query.item.FElem.
(org.basex.query.item.QNm. token-name))]
(for [[k v] dict]
(do
(println "THIS IS REACHED")
(let [k-name (org.basex.query.item.QNm. (.getName k))
k-attr (org.basex.query.item.FAttr.
k-name
org.basex.util.Token/token v))]
(.add elem k-attr))))
elem))
When using this to cry to create an element, "THIS IS REACHED" is never printed:
(def test-elem (basex-elem "element-name" {:key1 "value1", :key2 "value2"}))
; => #'user/test-elem
And thus the value comes back without any attributes:
test-elem
; => #<FElem <element-name/>>
But adding attributes works otherwise.
(.add test-elem
(org.basex.query.item.FAttr.
(org.basex.query.item.QNm. "foo")
(org.basex.util.Token/token "bar")))
; => #<FElem <element-name foo="bar"/>>
Thus, presumably I'm doing something wrong with the loop. Any pointers?
for is not a loop construct in clojure, rather it's a list comprehension and produces a lazy sequence.
Use doseq instead when side effects are intended.