Incanter - where is add-derived-column? - clojure

The add-derived-column function seems ideal for what I want, which is to calculate the mean of values from several columns and add the mean to the dataset. 1st problem: I (use '(incanter core datasets ...)). But when I run this test: (add-derived-column :het [:a :b] (fn [:a :b] (+ :a :b)) data1) the repl says 'unable to resolve symbol: add-derived-column'. It is the same with the example given in the docs. The incanter docs indicate the function is part of 'core'. Has this function been moved to another package or am I doing something silly. Assuming this function is not lost 2nd problem (coming): for the vector of columns can I use this instead: (def colnames [:a :b :c ..]) (add-deriv... :newcol colnames (fn [colnames] (some calculation)) datavals)?

Related

Reducing a list of maps to a list by in clojure

I've started to get some functional programming some weeks ago and I'm trying to perform a mapping from a list of maps to a list considering a specific key in clojure.
My list of maps looks like: '({:a "a1" :b "b1" :c "c1"} {:a "a2" :b "b2" :c "c2"} {:a "a3" :b "b3" :c "c3"})
And the output I'm trying to get is: '("b1" "b2" "b3").
I've tried the following:
(doseq [m maps]
(println (list (get m :b))))
And my output is a list of lists (what is expected as I'm creating a list for each iteration). So my question is, how can I reduce this to a single list?
Update
Just tried the following:
(let [x '()]
(doseq [m map]
(conj x (get m :b))))
However, it is still not working. I`m not getting the point as I was expecting to be appending the elements into a empty list
This is a very common pattern in production Clojure code so it's a good place to learn. In general check out the docs on sequences at https://clojure.org/reference/sequences and when faced with similar task, look to see which pattern best fits and explore functions in that group. In this case it's "Process each item of a seq to create a new seq" and the first item listed is map
your example might look like
(map :b my-data)
You have the right idea, but are using the wrong function. doseq is intended only for side effects and always returns nil. The function you are looking for is for, which takes a sequence as input and returns another sequence as output. I generally prefer for over the similar map as for allows you to name the loop variable:
(def data-list
[{:a "a1" :b "b1" :c "c1"}
{:a "a2" :b "b2" :c "c2"}
{:a "a3" :b "b3" :c "c3"}])
(let [result (vec (for [item data-list]
(:b item)))]
(println result) ; print result
result) ; return result from `let` expression
result => ["b1" "b2" "b3"]
If instead you do this:
(println
(doseq [item data-list]
(println (:b item))))
you can see the difference with doseq vs for:
b1 ; loop item #1
b2 ; loop item #2
b3 ; loop item #3
nil ; return value of doseq
Please see https://www.braveclojure.com/ for online details, and buy a good book (or 5) like Getting Clojure, etc.
(doseq [m maps]
(println (list (get m :b))))
In two short lines, you break several general rules of functional programming:
Pass data into a function as arguments, not as references to global
variables.
Don't print the results of computation. Return them as the value of
the function.
Avoid mechanisms such as doseq that work by side-effects.
Despite this, you were not too far from a solution. doseq is essentially a version of for that throws away its result. If we replace doseq with for, and get rid of the println and the list, we get
=> (for [m maps] (get m :b))
("b1" "b2" "b3")
But Arthur Ulfeldt's simple use of map is better.

What is the idiomatic way of returning the next member in collection?

What is the idiomatic way of returning the next item in collection, given a member in a collection?
For example, given (def coll [:a :b :c :d :e :f]), what should the f be to make (f coll :d) return :e?
Typically this is just not a thing one does very much in Clojure. The only possible implementation requires a linear scan of the input collection, which means that you are using the wrong data structure for this task.
Instead, we usually try to structure our data so that it is convenient for the tasks we need to perform on it. How best to do this will depend on why you want to look up "the element after foo". For example, if you are going the input one item at a time and want to know the next item as well as the current item, you could write (partition 2 1 input) to get a sequence of pairs of adjacent values.
That is, you ask for an idiomatic implementation, but there is none: the idiom is to solve the problem differently. Of course it is straightforward to write the loop yourself, if you believe you are in an exceptional case where you are using the right data structure and just need to do this weird thing once or twice.
As #amalloy said in his answer, this isn't something for which you would want to use the original data structure, because it would require a linear lookup every time. In other words, your (f coll :d) pattern wouldn't be a particularly useful thing due to its performance.
However, what you could do is define a function that, given a collection, builds a data structure that makes this sort of lookup efficient, and use that as your function. It might look something like this:
(defn after [xs]
(into {} (map vec (partition 2 1 xs))))
Examples:
(-> [:a :b :c :d :e :f] after :d)
;;=> :e
(let [xs [:a :b :c :d :e :f]
f (after xs)]
(map f xs))
;;=> (:b :c :d :e :f nil)
If we generalise the problem to finding the thing following the first thing to pass a test, we get something like
(defn following [pred coll]
(->> coll
(drop-while (complement pred))
(second)))
For example,
(following #{6} (range))
=> 7
Or, your example,
(following #{:d} coll)
=> :e
This is no more or less idiomatic than take-while or drop-while.

update-in with regex causing NullPointerException

I have the following functions and reduced sample:
(defn parse-time
[time-str]
(->> time-str
(re-find #"(\d{1,2}):(\d{2}):(\d{2})")
...))
(defn coerce-times
[m & ks]
(update-in m ks parse-time))
(coerce-times {:depart "05:05:00" :arrive "05:05:00"} :depart :arrive)
This works as expected with only one key, but when I try to use multiple keys (as in the example above), I get a NPE. Line 20 is the re-find line.:
java.lang.NullPointerException: null
at java.util.regex.Matcher.getTextLength (Matcher.java:1234)
java.util.regex.Matcher.reset (Matcher.java:308)
java.util.regex.Matcher.<init> (Matcher.java:228)
java.util.regex.Pattern.matcher (Pattern.java:1088)
clojure.core$re_matcher.invoke (core.clj:4460)
clojure.core$re_find.invoke (core.clj:4512)
tempest.core$parse_time.invoke (core.clj:20)
...
Can someone please help me understand what I'm doing wrong and how I can fix this?
The keys vector provided to update-in is not a collection of keys to operate on, but a series of lookups to follow:
user> (update-in {:a {:b {:c 0}}} [:a :b :c] inc)
{:a {:b {:c 1}}}

clojure filter map by keys

I'm following this example: http://groups.google.com/group/clojure/browse_thread/thread/99b3d792b1d34b56
(see the last reply)
And this is the cryptic error that I get:
Clojure 1.2.1
user=> (def m {:a "x" :b "y" :c "z" :d "w"})
#'user/m
user=> (filter #(some % [:a :b]) m)
java.lang.IllegalArgumentException: Key must be integer
(user=>
Also I don't understand why this would even work. Isn't (some ...) going to return the first matching value, "x", every time? I'm a total noob at clojure and just trying to learn.
Please enlighten me.
I guess I just needed to read the docs more:
(select-keys m [:a :b])
Although I'm still not sure what the intention was with the example I found...
If you "iterate" over a map, you'll get key-value pairs rather than keys. For instance,
user=> (map #(str %) {:a 1, :b 2, :c 3})
("[:a 1]" "[:b 2]" "[:c 3]")
Thus your anonymous function tries to evaluate (some [:a "x"] [:a :b]) which clearly does not work.
The ideomatic solution is to use select-keys as mentioned in another answer.
(filter
(fn [x]
(some #{(key x)} [:a :b])) m)
Would do the same using filter and some (but uglier and slower).
This works by filter all from m if some [:a :b] is in the set #{(key x)} (i.e. using a set as predicate) then return the map entry.

What is the idiomatic way to obtain a sequence of columns from an incanter dataset?

What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?
I thought of:
(to-vect (trans (to-matrix my-dataset)))
But Ideally, I'd like a lazy sequence. Is there a better way?
Use the $ macro.
=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data) ;; :a column
=> ($ 0 :all data) ;; first row
=> (type ($ :a data))
clojure.lang.LazySeq
Looking at the source code for to-vect it makes use of map to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray, probably just giving away all the benefits of map lazyness.
If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.
You could use the internal structure of the dataset.
user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
[dataset]
(for [column (:column-names dataset)]
(map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))
Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.