How to convert a lazy sequence to a map? - clojure

I've got a lazy sequence that, when sent to println, shows like this:
(((red dog) (purple cat)) ((green mouse) (yellow bird)))
Note that this is the result of reading a csv and trimming all "cell" values, hence (a) the fact that it's lazy at the time I want to print it and (b) in the future innermost lists might be more than 2 strings because more columns are added.
I'm trying to juse clojure.pprint/print-table to print this in a two-column table. I'm having a pretty hard time because print-table seems to want a map with data.
Here's a repro:
;; Mock the lazy data retrieved from a csv file
(defn get-lazy-data []
(lazy-seq '('('("red" "dog") '("purple" "cat")) '('("green" "mouse") '("yellow" "bird")))))
(defn -main []
(let [data (get-lazy-data)]
(println "Starting...")
(println data)
(println "Continuing...")
(print-table data)
(println "Finished!")))
This gives an error:
Exception in thread "main" java.lang.ClassCastException: clojure.lang.Symbol cannot be cast to java.util.Map$Entry
I've tried various options:
(print-table (apply hash-map data)) gives same exception
(print-table (zipmap data)) tells me to provide another argument for the keys, but I want a solution that doesn't rely on specifying the number of columns beforehand
comprehending and adapting the answer to "Clojure printing lazy sequence", which would be a duplicate of my question were it not that both the question and answer seem so much more complex that I don't know how to translate that solution to my own scenario
Basically I know I have an XY-problem but now I want answers to both questions:
X: How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
Y: How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?

How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
The fact that your "rows" seem to be grouped in pairs is odd, assuming you want a two-column table of color/animal, so we can remove the extra grouping with mapcat identity then zipmap those pairs with the desired map keywords:
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(def de-tupled (mapcat identity my-list))
(map #(zipmap [:color :animal] %) de-tupled)
=> ({:color red, :animal dog} {:color purple, :animal cat} {:color green, :animal mouse} {:color yellow, :animal bird})
(clojure.pprint/print-table *1)
| :color | :animal |
|--------+---------|
| red | dog |
| purple | cat |
| green | mouse |
| yellow | bird |
It's not clear from the question, but it seems like you want to support an arbitrary number of "columns" which kinda precludes having fixed names for them. In that case you can do something like this:
(def my-list ;; added third mood "column"
'(((red dog happy) (purple cat sad)) ((green mouse happy) (yellow bird sad))))
(def de-tupled (apply concat my-list))
(clojure.pprint/print-table (map #(zipmap (range) %) de-tupled))
| 0 | 1 | 2 |
|--------+-------+-------|
| red | dog | happy |
| purple | cat | sad |
| green | mouse | happy |
| yellow | bird | sad |
How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(zipmap (range) my-list)
=> {0 ((red dog) (purple cat)), 1 ((green mouse) (yellow bird))}

A related point to your problem is how you print out your data. Clojure has two ways to print:
(dotest
(println ["hello" "there" "everybody"]) ; #1
(prn ["hello" "there" "everybody"])) ; #2
#1 => [hello there everybody]
#2 => ["hello" "there" "everybody"]
For strings the presence of quotes in #2 makes a huge difference in understanding what is happening. The prn function produces output that is machine-readable (like what you type in your source code). You really need that if you have strings involved in your data.
Look at the difference with symbols:
(println ['hello 'there 'everybody])
(prn ['hello 'there 'everybody])
; doesn't matter if you quote the whole form or individual symbols
(println '[hello there everybody])
(prn '[hello there everybody])
all results are the same:
[hello there everybody]
[hello there everybody]
[hello there everybody]
[hello there everybody]
The point is that you need prn to tell the difference between symbols and strings when you print results. Note that the prn output format (with double-quotes) happens automatically if you use pprint:
(def data
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]])
(clojure.pprint/pprint data) =>
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]]

Related

Clojure: how to get the *value* of a symbol as a keyword

I have some data from a file like this:
A abcdefghi...
B bcdefghij...
I would like to transform this into a bunch of maps, one per row:
A: abcdefghi
B: bcdefghij
But I don't see how to do this. I can get the two parts I want into symbols
part1
and
part2
But if use
#{(keyword part1) part2}
I get
{:part1 "abcdefghijklmnopqrstuvwxyz"}
But is there a way to get the value of part1 instead of that name itself?
You need to do some reading and clarify your question.
This syntax:
#{ 3 1 4 }
creates a set of values, not a map. You also need to clarify part1 and part2- what are those?
Also, keyword literals have the colon at the beginning:
{ :a 1 :b 2 } ; some map
There is a good list of docs here. Especially read "Getting Clojure" and the Clojure CheatSheet.
Using Ctrl-v TAB echo text into a test file:
$ echo "A abcde
B bcdef" > test.txt
Test correct writing into file:
$ cat test.txt
A abcde
B bcdef
Then, in the clojure REPL, write functions:
(defn file2cells [fpath]
(map #(clojure.string/split % #"\t")
(clojure.string/split (slurp fpath) #"\n")))
(defn keywordize-first [vec-list]
(map (fn [[k v]] [(keyword k) v]) vec-list))
(defn file2maps [fpath]
(into {} (keywordize-first (file2cells fpath))))
Apply the functions:
(file2maps "test.txt")
;; => {:A "abcde", :B "bcdef"}

Using split on each element of a vector

Basically, I have used slurp to get the contents of a file that is supposed to be a database. I've split the data already once and have a vector that contains all the information correctly. Now I would like to split each element in the vector again. This would give me a vector of vectors. My problem is I can't seem to find the right way to iterate through the vector and make my changes. The changes either don't work or are not stored in the vector.
Using doseq:
(doseq [x tempVector]
(clojure.string/split x #"|")
)
If I add a print statement in the loop it prints everything spaced out with no changes.
What am I doing wrong?
The str/split function returns a new vector of strings, which you need to save. Right now it is being generated and then discarded. You need something like this:
(ns xyz
(:require
[clojure.string :as str]))
(def x "hello there to you")
(def y (str/split x #" ")) ; save result in `y`
(def z (str/split x #"e")) ; save result in `z`
y => ["hello" "there" "to" "you"]
z => ["h" "llo th" "r" " to you"]
You can read clojure basics online here: https://www.braveclojure.com .
I recommend buying the book as it has more stuff than the online version.
If you have several strings in a vector, you can use the map function to split each of them in turn:
(def my-strings
["hello is there anybody in there?"
"just nod if you can hear me"
"is there anyone at home?"])
(def my-strings-split
(mapv #(str/split % #" ") my-strings))
my-strings-split =>
[["hello" "is" "there" "anybody" "in" "there?"]
["just" "nod" "if" "you" "can" "hear" "me"]
["is" "there" "anyone" "at" "home?"]]
To restructure your slurped lines of text into a collection of vectors of words you could do something like:
(use '[clojure.string :as str :only [split]])
(defn file-as-words [filename re]
(let [lines (line-seq (clojure.java.io/reader filename))
line-words (vec (mapv #(str/split %1 re) lines))]
line-words))
Here we define a function which first uses line-seq to slurp the file in and break it into a collection of lines, then we map an anonymous function which invokes clojure.string/split on each line in the initial collection, breaking each line up into a collection of words delimited by the passed-in regular expression. The collection of vectors-of-words is returned.
For example, let's say we have a file named /usr/data/test.dat which contains
Alice,Eating,001
Kitty,Football,006
May,Football,004
If we invoke file-as-words by using
(file-as-words "/usr/data/test.dat" #",")
you get back
[["Alice" "Eating" "001"] ["Kitty" "Football" "006"] ["May" "Football" "004"]]

How do I add a list to a table in Clojure?

This is a model of my list.
[ [name age salary] [name age salary] [name age salary] ]
Let's say I have a def named "description_list" that contains this list.
How do I iterate through description_list and put it into a table. I tried doing this:
(print-table [:Name :Age :Salary] description_list)
And that prints out 3 empty rows of a table for me. I need it to actually contain the information from the list. How can I accomplish this?
This is expected behaviour. See the doc for print-table.
Prints a collection of maps in a textual table.
So you need to turn your descr_list into a list of maps. E.g.
user=> (let [h [:a :b]
d [[1 2][3 4]]]
(clojure.pprint/print-table
h
(map (partial zipmap h) d)))
| :a | :b |
|----+----|
| 1 | 2 |
| 3 | 4 |

Filtering a hashmap by value in Clojure when values have compound data

I'm trying to teach myself Clojure.
For a work-related project (to state the obvious, I'm not a professional programmer), I'm trying to combine a bunch of spreadsheets. The spreadsheets have comments that relate to financial transactions. Multiple comments (including across spreadsheets) can refer to the same transaction; each transaction has a unique serial number. I am therefore using the following data structure to represent the spreadsheets:
(def ss { :123 '([ "comment 1" "comment 2" ]
[ "comment 3" "comment 4" ]
[ "comment 5" ]),
:456 '([ "happy days" "are here" ]
[ "again" ])})
This might be created from the following two spreadsheets:
+------------+------------+-----------+
| Trans. No. | Cmt. A | Cmt. B |
+------------+------------+-----------+
| 123 | comment 1 | comment 2 |
| 456 | happy days | are here |
| 123 | comment 3 | comment 4 |
+------------+------------+-----------+
+-----------------+------------+
| Analyst Comment | Trans. No. |
+-----------------+------------+
| comment 5 | 123 |
| again | 456 |
+-----------------+------------+
I have successfully written functions to create this data structure given a directory full of CSVs. I want to write two further functions:
;; FUNCTION 1 ==========================================================
;; Regex Spreadsheet -> Spreadsheet ; "Spreadsheet" is like ss above
;; Produces a Spreadsheet with ALL comments per transaction if ANY
;; value matches the regex
; (defn filter-all [regex my-ss] {}) ; stub
(defn filter-all [regex my-ss] ; template
(... my-ss))
(deftest filter-all-tests
(is (= (filter-all #"1" ss)
{ :123 '([ "comment 1" "comment 2" ]
[ "comment 3" "comment 4" ]
[ "comment 5" ]) })))
;; FUNCTION 2 ==========================================================
;; Regex Spreadsheet -> Spreadsheet ; "Spreadsheet" is like ss above
;; Produces a Spreadsheet with each transaction number that has at least
;; one comment that matches the regex, but ONLY those comments that
;; match the regex
; (defn filter-matches [regex my-ss] {}) ; stub
(defn filter-matches [regex my-ss] ; template
(... my-ss))
(deftest filter-matches-tests
(is (= (filter-matches #"1" ss)
{ :123 '([ "comment 1" ]) })))
What I don't understand is the best way to get the regex far enough down into the vals for each key, given that they are strings nested inside vectors nested inside lists. I have tried using filter with nested applys or maps, but I'm confusing myself with the syntax and even if it works I don't know how to hang on to the keys in order to build up a new hashmap.
I have also tried using destructuring within the filter function, but there too I'm confusing myself and I also think I have to "lift" the functions across the nested data (I think that's the term—like applicatives and monads in Haskell).
Can somebody please suggest the best approach to filtering this data structure? As a separate matter, I would be glad to have feedback on whether this is a sensible data structure for my purposes, but I would like to learn how to solve this problem as it currently exists, if only for learning purposes.
Thanks much.
Here a solution with your data structure.
filter takes a predicate function. Into that function you can actually get in the data structure to test whatever you need. Here, flatten helps to remove the list of vector of comments.
(defn filter-all [regex my-ss]
(into {} (filter (fn [[k v]] ; map entry can be destructured into a vector
; flatten the vectors into one sequence
; some return true if there is a match on the comments
(some #(re-matches regex %) (flatten v)))
my-ss)))
user> (filter-all #".*3.*" ss)
{:123 (["comment 1" "comment 2"] ["comment 3" "comment 4"] ["comment 5"])}
For filter-matches the logic is different : you want to build a new map with some parts of the values. reduce can help doing that :
(defn filter-matches [regex my-ss]
(reduce (fn [m [k v]] ; m is the result map (accumulator)
(let [matches (filter #(re-matches regex %) (flatten v))]
(when (seq matches)
(assoc m k (vec matches)))))
{}
my-ss))
user> (filter-matches #".*days.*" ss)
{:456 ["happy days"]}
For the data structure itself, if there is no use to keep the nested vectors into the list for each entry, you can simplify with {:123 ["comment1" "comments 2"] ...}, but it won't drastically simplify the above functions.
I think your sort of on the right track, but perhaps making life a little harder than it needs to be.
Of greatest concern is your use of regular expressions. While regexp are a good tool for some things, they are often used when other solutions would be better and a lot faster.
One of the key ideas to adopt in clojure is the use of small libraries which you assemble together to get a higher level of abstraction. For example, there are various libraries for handling different spreadsheet formats, such as excel, google docs spreadsheets and there is support for processing CSV files. Therefore, my first step would be to see if you can find a library which will parse your spreadhseet into a standard clojure data structure.
For example, clojure's data.csv will process a CSV spreadsheet into a lazy sequence of vectors where each vector is a line from the spreadsheet and each element in the vector is a column value from that line. Once you have your data in that format, then processing it with map, filter et. al. is fairly trivial.
The next step is to think about the type of abstraction which will make your processing as easy as possible. this will depend largely on what you plan to do, but my suggestion with this sort of data would be to use a nested structure consisting of hash maps which in the outer layer are indexed by your transaction number and each value is then a hash map which has an entry for each column in the spreadsheet.
{:123 {:cmnta ["comment 1" "comment 3"]
:cmntb ["comment 2" "comment 4"]
:analstcmt ["comment 5"]}
:456 {:cmnta ["happy days"]
:cmntb ["are here"]
:analystcmt ["again"]}}
With this structure, you can then use functions like get-in and update-in to access/change the values in your structure i.e.
(get-in m [123 :cmnta]) => ["comment 1" "comment 3"]
(get-in m [123 :cmnta 0]) => "comment 1"
(get-in m [456 :cmnta 1]) => nil
(get-in m [456 :cmnta 1] "nothing to see here - move on") => "nothing to see here - move on"

Incanter - How can I use filter with column keywords instead of nth?

(require '[incanter.core :as icore])
;; Assume dataset "data" is already loaded by incanter.core/read-dataset
;; Let's examine the columns (note that Volume is the 5th column)
(icore/col-names data)
==> [:Date :Open :High :Low :Close :Volume]
;; We CAN use the :Volume keyword to look at just that column
(icore/sel data :cols Volume)
==> (11886469 9367474 12847099 9938230 11446219 12298336 15985045...)
;; But we CANNOT use the :Volume keyword with filters
;; (well, not without looking up the position in col-names first...)
(icore/sel data :filter #(> (#{:Volume} %) 1000000))
Obviously this is because the filter's anon function is looking at a LazySeq, which no longer has the column names as part of its structure, so the above code won't even compile. My question is this: Does Incanter have a way to perform this filtered query, still allowing me to use column keywords? For example, I can get this to work because I know that :Volume is the 5th column
(icore/sel data :filter #(> (nth % 5) 1000000))
Again, though, I'm looking to see if Incanter has a way of preserving the column keyword for this type of filtered query.
Example dataset:
(def data
(icore/dataset
[:foo :bar :baz :quux]
[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]]))
Example query with result:
(icore/$where {:baz {:fn #(> % 1)}} data)
| :foo | :bar | :baz | :quux |
|------+------+------+-------|
| 2 | 2 | 2 | 2 |
Actually this could also be written
(icore/$where {:baz {:gt 1}} data)
Several such "predicate keywords" are support apart from :gt: :lt, :lte, :gte, :eq (corresponding to Clojure's =), :ne (not=), :in, :nin (not in).
:fn is the general "use any function" keyword.
All of these can be prefixed with $ (:$fn etc.) with no change in meaning.