Retreiving data in clojure - clojure

I have three text files as http://paste.debian.net/plain/1027720. As the third file is in the following format
Third File
salesID | custID | prodID | itemCount
1|1|1|3
2|2|2|3
I want to display the table such that custID should be replaced by the customer name and the prodID by the product description,
as follows:
1: ["John" "shoes" "3"]
What I did till now is :
(def data (slurp "cust.txt"))
(->> (for [line (clojure.string/split data #"[ ]*[\r\n]+[ ]*")]
(-> line (clojure.string/split #"\|") rest vec))
(map vector (rest (range))))
How I can retreive and map the values accordingly?
EDIT
"demo_1.txt"
content id|name|address|phone-number
1|John|123 Street|456-4567
2|Smith|123 Here Street|456-4567
"demo_2.txt"
prodID | item | Cost
1|shoes|14.96
2|milk|1.98

The processing of this data is similar to how I process CSV files. I like to split the problem into functions that do line to vector and vector to map, using the first row as the header for each.
(defn line->vec [s]
(s/split s #"\|"))
(defn vec->map [desc row]
(into {}
(map vector desc row))) ; Map accepts multiple collections
(defn file->maps [filename]
; Destructuring here, for easy capturing of header row
(let [[desc & lines] (->> (slurp filename)
(s/split-lines)
(map line->vec))
desc-keys (map keyword desc)]
(for [line lines]
(vec->map desc-keys line))))
For your demo files, you can use group-by to generate a map, sort of like an index (I manually fixed the header formatting, but you'd want to do it with a utility fn):
For (group-by :content-id (file->maps "demo_1.txt"))
{"1" [{:address "123 Street",
:phone-number "456-4567",
:name "John",
:content-id "1"}],
"2" [{:address "123 Here Street",
:phone-number "456-4567",
:name "Smith",
:content-id "2"}]}
For (group-by :prodID (file->maps "demo_2.txt"))
{"1" [{:item "shoes", :prodID "1", :cost "14.96"}],
"2" [{:item "milk", :prodID "2", :cost "1.98"}]}
And then replace each column with its index value:
(defn replace-value [index idx-key m k]
(update m k #(get-in index [% 0 idx-key])))
(defn -main [& args]
(let [customers (group-by :content-id (file->maps "demo1.txt"))
products (group-by :prodID (file->maps "demo2.txt"))]
; Use customers and products to replace some data
(->> (file->maps "demo_3.txt")
(map #(replace-value customers :name % :content-id))
(map #(replace-value products :item % :prodID)))))
And the result:
({:prodID "shoes", :content-id "John", :salesID "1", :itemCount "3"}
{:prodID "milk", :content-id "Smith", :salesID "2", :itemCount "3"})
Then it should be straightforward to convert those maps back into the format you want.

Related

How to format string in Clojure by using map-indexed and vector

I am trying to read content from a file which is in the format :
ID|Name|Country-Name|Phone-Number eg:
1|Austin|Germany|34-554567
2|Mary Jane|Australia|45-78647
I am using following code to fetch data from it :
(
map-indexed
#(vector %1 %2)
(
map #(vec(.split #"\|" %1))
(
line-seq (clojure.java.io/reader "test.txt")
)
)
)
with this code I am getting this output:
([0 ["1" "Austin" "Germany" "34-554567"]] [1 ["2" "Mary Jane" "Australia" "45-78647"]] [2 ["3" "King Kong" "New-Zealand" "35-467533"]])
I want the output to be like:
ID:["name" "country-name" "phone-number"]
ID:["name" "country-name" "phone-number"]
eg:
1:["Austin" "Germany" "34-554567"]
2:["Mary Jane" "Australia" "45-78647"]
where ID is to be incremented by 1 (start from 1,2,3 and so on) and each result lists the identity or ID, succeeded by the data united with the ID and it should be sorted by an ID.
What changes do I do to my code to make this happen?
maybe
(into {} (map-indexed
#(vector (inc %1) (rest %2))
(repeat 2 ["1" "Austin" "Germany" "34-554567"])))
It looks like your data already has indexes in it:
(def data
"1|Austin|Germany|34-554567
2|Mary Jane|Australia|45-78647
3|King Kong|New-Zealand|35-467533 ")
(defn fmt [line]
(let [sections (-> line
str/trim
(str/split #"\|")) ]
sections) )
(defn run []
(let [lines (vec (str/split-lines data)) ]
(mapv fmt lines)))
(run)
with result:
sections => ["1" "Austin" "Germany" "34-554567"]
sections => ["2" "Mary Jane" "Australia" "45-78647"]
sections => ["3" "King Kong" "New-Zealand" "35-467533"]
If you wanted to throw away the indexes in the data, you could generate your own like so:
(defn fmt [idx line]
(let [sections (-> line
str/trim
(str/split #"\|"))
sections-keep (rest sections)
result (apply vector idx sections-keep)]
result))
(defn run []
(let [lines (vec (str/split-lines data))]
(mapv fmt (range 1 1e9) lines)))
Update
If you want to use a disk file, do this:
(def data
"1|Austin|Germany|34-554567
2|Mary Jane|Australia|45-78647
3|King Kong|New-Zealand|35-467533 ")
(defn fmt [idx line]
(let [sections (-> line
str/trim
(str/split #"\|"))
sections-keep (rest sections)
result (apply vector idx sections-keep)]
result))
(defn run [filename]
(let [lines (vec (str/split-lines (slurp filename)))]
(mapv fmt (range 1 1e9) lines)))
(let [filename "/tmp/demo.txt"]
(spit filename data)
(run filename))
A guess:
(def data
"1|Austin|Germany|34-554567
2|Mary Jane|Australia|45-78647
3|King Kong|New-Zealand|35-467533")
(->> (for [line (clojure.string/split data #"[ ]*[\r\n]+[ ]*")]
(-> line (clojure.string/split #"\|") rest vec))
(map vector (rest (range))))
; ([1 ["Austin" "Germany" "34-554567"]]
; [2 ["Mary Jane" "Australia" "45-78647"]]
; [3 ["King Kong" "New-Zealand" "35-467533"]])
Although I'm not sure why you want to have explicit auto-generated ID in the result and ignore the serial-number you've got in the original data.
Optionally add (into (sorted-map)) to the end so you get the sequential ids mapped to values, and this retains the id order unlike a hash-map.

Converting string to nested map in Clojure

I have a file containing some text like:
1|apple|sweet
2|coffee|bitter
3|gitpush|relief
I want to work with this input using a map. In Java or Python, I would have made a nested map like:
{1: {thing: apple, taste: sweet},
2: {thing: coffee, taste: bitter},
3: {thing: gitpush, taste: relief}}
Or even a list inside the map like:
{1: [apple, sweet],
2: [coffee, bitter],
3: [grape, sour]}
The end goal is to access the last two column's data efficiently using the first column as the key.
I want to do this in Clojure and I am new to it. So far, I have succeeded in creating a list of map using the following code:
(def cust_map (map (fn [[id name taste]]
(hash-map :id (Integer/parseInt id)
:name name
:taste taste ))
(map #(str/split % #"\|") (line-seq (clojure.java.io/reader path)))))
And I get this, but it's not what I want.
({1, apple, sweet},
{2, coffee, bitter},
{3, gitpush, relief})
It would be nice if you can show me how to do the most efficient of, or both nested map and list inside map in Clojure. Thanks!
When you build a map with hash-map, the arguments are alternative keys and values. For example:
(hash-map :a 0 :b 1)
=> {:b 1, :a 0}
From what I understand, you want to have a unique key, the integer, which maps to a compound object, a map:
(hash-map 0 {:thing "apple" :taste "sweet"})
Also, you do not want to call map, which would result in a sequence of maps. You want to have a single hash-map being built.
Try using reduce:
(reduce (fn [map [id name taste]]
(merge map
(hash-map (Integer/parseInt id)
{:name name :taste taste})))
{}
'(("1" "b" "c")
("2" "d" "e")))
--- edit
Here is the full test program:
(import '(java.io BufferedReader StringReader))
(def test-input (line-seq
(BufferedReader.
(StringReader.
"1|John Smith|123 Here Street|456-4567
2|Sue Jones|43 Rose Court Street|345-7867
3|Fan Yuhong|165 Happy Lane|345-4533"))))
(def a-map
(reduce
(fn [map [id name address phone]]
(merge map
(hash-map (Integer/parseInt id)
{:name name :address address :phone phone})))
{}
(map #(clojure.string/split % #"\|") test-input)))
a-map
=> {1 {:name "John Smith", :address "123 Here Street", :phone "456-4567"}, 2 {:name "Sue Jones", :address "43 Rose Court Street", :phone "345-7867"}, 3 {:name "Fan Yuhong", :address "165 Happy Lane", :phone "345-4533"}}
I agree with #coredump that this is not concise, yet a quick solution to your code is using a list (or any other collection) and a nested map:
(def cust_map (map (fn [[id name taste]]
(list (Integer/parseInt id)
(hash-map :name name
:taste taste)))
(map #(clojure.string/split % #"\|") (line-seq (clojure.java.io/reader path)))))
This may be a somewhat naive view on my part, as I'm not all that experienced with Clojure, but any time I want to make a map from a collection I immediately think of zipmap:
(require '[clojure.java.io :as io :refer [reader]])
(defn lines-from [fname]
(line-seq (io/reader fname)))
(defn nested-map [fname re keys]
"fname : full path and filename to the input file
re : regular expression used to split file lines into columns
keys : sequence of keys for the trailing columns in each line. The first column
of each line is assumed to be the line ID"
(let [lines (lines-from fname)
line-cols (map #(clojure.string/split % re) lines) ; (["1" "apple" "sweet"] ["2" "coffee" "bitter"] ["3" "gitpush" "relief"])
ids (map #(Integer/parseInt (first %)) line-cols) ; (1 2 3)
rest-cols (map rest line-cols) ; (("apple" "sweet") ("coffee" "bitter") ("gitpush" "relief"))
rest-maps (map #(zipmap keys %) rest-cols)] ; ({:thing "apple", :taste "sweet"} {:thing "coffee", :taste "bitter"} {:thing "gitpush", :taste "relief"})
(zipmap ids rest-maps)))
(nested-map "C:/Users/whatever/q50663848.txt" #"\|" [:thing :taste])
produces
{1 {:thing "apple", :taste "sweet"}, 2 {:thing "coffee", :taste "bitter"}, 3 {:thing "gitpush", :taste "relief"}}
I've shown the intermediate results of each step in the let block as a comment so you can see what's going on. I've also tossed in lines-from, which is just my thin wrapper around line-seq to keep myself from having to type in BufferedReader. and StringReader. all the time. :-)

Untuple a Clojure sequence

I have a function that is deduplicating with preference, I thought of implementing the solution in clojure using flambo function thus:
From the data set, using the group-by, to group duplicates (i.e based on a specified :key)
Given a :val as input, using a filter to check if the some of values for each row are equal to this :val
Use a map to untuple the duplicates to return single vectors (Not very sure if that is the right way though, I tried using a flat-map without any luck)
For a sample data-set
(def rdd
(f/parallelize sc [ ["Coke" "16" ""] ["Pepsi" "" "5"] ["Coke" "2" "3"] ["Coke" "" "36"] ["Pepsi" "" "34"] ["Pepsi" "25" "34"]]))
I tried this:
(defn dedup-rows
[rows input]
(let [{:keys [key-col col val]} input
result (-> rows
(f/group-by (f/fn [row]
(get row key-col)))
(f/values)
(f/map (f/fn [rows]
(if (= (count rows) 1)
rows
(filter (fn [row]
(let [col-val (get row col)
equal? (= col-val val)]
(if (not equal?)
true
false))) rows)))))]
result))
if I run this function thus:
(dedup-rows rdd {:key-col 0 :col 1 :val ""})
it produces
;=> [(["Pepsi" 25 34]), (["Coke" 16 ] ["Coke" 2 3])]]
I don't know what else to do to handle the result to produce a result of
;=> [["Pepsi" 25 34],["Coke" 16 ],["Coke" 2 3]]
I tried f/map f/untuple as the last form in the -> macro with no luck.
Any suggestions? I will really appreciate if there's another way to go about this.
Thanks.
PS: when grouped
;=> [[["Pepsi" "" 5], ["Pepsi" "" 34], ["Pepsi" 25 34]], [["Coke" 16 ""], ["Coke" 2 3], ["Coke" "" 36]]]
For each group, rows that have"" are considered duplicates and hence removed from the group.
Looking at the flambo readme, there is a flat-map function. This is slightly unfortunate naming because the Clojure equivalent is called mapcat. These functions take each map result - which must be a sequence - and concatenates them together. Another way to think about it is that it flattens the final sequence by one level.
I can't test this but I think you should replace your f/map with f/flat-map.
Going by #TheQuickBrownFox suggestion, I tried the following
(defn dedup-rows
[rows input]
(let [{:keys [key-col col val]} input
result (-> rows
(f/group-by (f/fn [row]
(get row key-col)))
(f/values)
(f/map (f/fn [rows]
(if (= (count rows) 1)
rows
(filter (fn [row]
(let [col-val (get row col)
equal? (= col-val val)]
(if (not equal?)
true
false))) rows)))
(f/flat-map (f/fn [row]
(mapcat vector row)))))]
result))
and seems to work

Clojure: not the whole collection after conversion to hash map

I am exploring the exciting world of Clojure, but I am stopped on this...
I have two vectors, different in length, stored in vars.
(def lst1 ["name" "surname" "age"])
(def lst2 ["Jimi" "Hendrix" "28" "Sam" "Cooke" "33" "Buddy" "Holly" "23"])
I want to interleave them and obtain a map, with keys from first list and values from the second, like the following one:
{"name" "Jimi" , "surname" "Hendrix" , "age" "28" ,
"name" "Sam" , "surname" "Cooke" , "age" "33" ... }
even the following solution, with proper keys, would be ok:
{:name "Jimi" , :surname "Hendrix" , :age" "28" , ... }
I can use interleave-all function from Medley library and then apply the hash-map fn:
(apply hash-map
(vec (interleave-all (flatten (repeat 3 lst1)) lst2)))
=> {"age" "23", "name" "Buddy", "surname" "Holly"}
but returns just last musician. This persistent hashmap is not ordered, but is not the point.
I later tried to pair keys and values, maybe for a possible future use of assoc, who knows...
(map vector
(for [numMusicians (range 0 3) , keys (range 0 3)] (-> lst1 (nth keys) (keyword)))
(for [values (range 0 9)] (-> lst2 (nth values) (str)))
)
Returns a lazy sequence with paired vectors and proper keywords.
=> ([:name "Jimi"] [:surname "Hendrix"] [:age "28"] [:name "Sam"] ...)
Now I want to try into that should return
a new coll consisting of to-coll with ALL of the items of from-coll
conjoined.
(into {}
(map vector
(for [numMusicians (range 0 3) , keys (range 0 3)] (-> lst1 (nth keys) (keyword)))
(for [values (range 0 9)] (-> lst2 (nth values) (str)))
))
But again:
=> {:name "Buddy", :surname "Holly", :age "23"}
just the last musician, this time in a persistent array map.
I want a map with all my dead musicians. Someone knows where I am wrong?
Edit:
Thank you guys! Have managed the fn this way:
(use 'clojure.set)
(->> (partition 3 lst2) (map #(zipmap % lst1)) (map map-invert))
=> ({"name" "Jimi", "surname" "Hendrix", "age" "28"} {"name" "Sam", "surname" "Cooke", "age" "33"} {"name" "Buddy", "surname" "Holly", "age" "23"})
Each key can only exist once in a map. So your later values overwrite the earlier ones.
To get a list of maps per artiste you could do something like:
(def lst1 ["name" "surname" "age"])
(def lst2 ["Jimi" "Hendrix" "28" "Sam" "Cooke" "33" "Buddy" "Holly" "23"])
(->> (partition 3 lst2) ; Split out the seperate people
(map (fn [artist-seq] (zipmap lst1 artist-seq)))) ; Use zipmap to connect the keys and values.
This should work for any number of people as long as all the values are there, in the right order
Although you want the following form:
{"name" "Jimi" , "surname" "Hendrix" , "age" "28" ,
"name" "Sam" , "surname" "Cooke" , "age" "33" ... }
This is not allowed because keys are collided. You can't add "name" as a key several times. Key should be unique in a map.
But you can construct a list of map with the following code:
user=> (->> (map (fn [ks vs] (interleave ks vs)) (repeat 3 lst1) (partition 3 lst2))
(map #(apply hash-map %)))
({"age" "28", "name" "Jimi", "surname" "Hendrix"} {"age" "33", "name" "Sam", "surname" "Cooke"} {"age" "23", "name" "Buddy", "surname" "Holly"})
UPDATE
#status203's solution which uses zipmap looks much better.
user=> (->> (partition 3 lst2)
(map #(zipmap lst1 %)))
({"age" "28", "surname" "Hendrix", "name" "Jimi"} {"age" "33", "surname" "Cooke", "name" "Sam"} {"age" "23", "surname" "Holly", "name" "Buddy"})

Summarising (grouping and counting) a sequence of maps

I'm trying to find an idiomatic way in Clojure of grouping a sequence of maps by certain keys and providing counts. Sort of like 'SELECT X, Y, COUNT(*) FROM Z GROUP BY X, Y' in SQL. The data looks like this:
({:status "Academy Sponsor Led",
:pupil-population "",
:locality "Northamptonshire",
:pupil-gender "Mixed",
:county "Northamptonshire",
:pupil-age "11-18",
:school "Wrenn School",
:website ""}
{:status "Academy Sponsor Led",
:pupil-population "915",
:locality "Plymouth",
:pupil-gender "Mixed",
:county "Devon",
:pupil-age "11-19",
:school "The All Saints Church of England Academy",
:website "http://www.asap.org.uk/"}
{:status "Academy Converter",
:pupil-population "735",
:locality "Somerset",
:pupil-gender "Mixed",
:county "Somerset",
:pupil-age "11-16",
:school "Stanchester Academy",
:website "www.Stanchester-Academy.co.uk"}
{:status "Community School",
:pupil-population "",
:locality "Herefordshire",
:pupil-gender "Mixed",
:county "Herefordshire",
:pupil-age "11-18",
:school "Lady Hawkins High School",
:website "http://www.lhs.hereford.sch.uk"}...
and my solution looks like this:
(defn summarise-locality-status
"Return counts of status within locality"
[data]
(let [locality (group-by :locality data)
locality-status (map #(vector (first %) (group-by :status (second %))) locality)
counts-fn (fn [locality-status-item]
(let [statuses (second locality-status-item)]
(map #(vector % (count (get statuses %))) (keys statuses))))]
(map #(vector (first %) (counts-fn %)) locality-status)))
However it feels a bit clunky. What would be better way of doing this?
Depending on your needs,
(frequencies (for [r data] (select-keys r [:locality :status])))
is closer to the SQL, in that it is not nested.
Another solution, introducing juxt and reduce-kv:
(->> data
(group-by (juxt :locality :status))
(reduce-kv #(assoc-in % %2 (count %3)) {}))
This might be closest to your original SQL and more intuitively understandable.
How about
(reduce #(update-in %1 [(:locality %2) (:status %2)] (fnil inc 0)) {} data)
or
(reduce #(update-in %1 ((juxt :locality :status) %2) (fnil inc 0)) {} data)
The output is a little different (hash maps instead of lists), but that's easy to change. Using a hash map makes group-by superfluous and the code a lot shorter/easier.
(for [[locality statuses] (group-by :locality data)]
{:locality locality :all_status
(for [[status items] (group-by :status statuses)]
{:status status :count (count items)})})