How to read a file with test data in with Clojure? - clojure

I am writing a piece of code that needs to read in a text file that has data. The text file is in the format:
name 1 4
name 2 4 5
name 3 1 9
I am trying to create a vector of a map in the form [:name Sarah :weight 1 cost :4].
When I try reading the file in with the line-seq reader, it reads each line as an item so the partition is not correct. See repl below:
(let [file-text (line-seq (reader "C://Drugs/myproject/src/myproject/data.txt"))
new-test-items (vec (map #(apply struct item %) (partition 3 file-text)))]
(println file-text)
(println new-test-items))
(sarah 1 1 jason 4 5 nila 3 2 jonas 5 6 judy 8 15 denny 9 14 lis 2 2 )
[{:name sarah 1 1, :weight jason 4 5, :value nila 3 2 } {:name jonas 5 6, :weight judy 8 15, :value denny 9 14}]
I then tried to just take 1 partition, but still the structure is not right.
=> (let [file-text (line-seq (reader "C://Drugs/myproject/src/myproject/data.txt"))
new-test-items (vec (map #(apply struct item %) (partition 1 file-text)))]
(println file-text)
(println new-test-items))
(sarah 1 1 jason 4 5 nila 3 2 jonas 5 6 judy 8 15 denny 9 14 lis 2 2 )
[{:name sarah 1 1, :weight nil, :value nil} {:name jason 4 5, :weight nil, :value nil} {:name nila 3 2 , :weight nil, :value nil} {:name jonas 5 6, :weight nil, :value nil} {:name judy 8 15, :weight nil, :value nil} {:name denny 9 14, :weight nil, :value nil} {:name lis 2 2, :weight nil, :value nil} {:name , :weight nil, :value nil}]
nil
Next I tried to slurp the file, but that is worse:
=> (let [slurp-input (slurp "C://Drugs/myproject/src/myproject/data.txt")
part-items (partition 3 slurp-input)
mapping (vec (map #(apply struct item %) part-items))]
(println slurp-input)
(println part-items)
(println mapping))
sarah 1 1
jason 4 5
nila 3 2
jonas 5 6
judy 8 15
denny 9 14
lis 2 2
((s a r) (a h ) (1 1) (
Please help! This seems like such an easy thing to do in Java, but is killing me in Clojure.

split it into a sequence of lines:
(line-seq (reader "/tmp/data"))
split each of them into a sequence of words
(map #(split % #" ") data)
make a function that takes a vector of one data and turns it into a map with the correct keys
(fn [[name weight cost]]
(hash-map :name name
:weight (Integer/parseInt weight)
:cost (Integer/parseInt cost)))
then nest them back together
(map (fn [[name weight cost]]
(hash-map :name name
:weight (Integer/parseInt weight)
:cost (Integer/parseInt cost)))
(map #(split % #" ") (line-seq (reader "/tmp/data"))))
({:weight 1, :name "name", :cost 4}
{:weight 2, :name "name", :cost 4}
{:weight 3, :name "name", :cost 1})
you can also make this more compact by using zip-map

You are trying to do everything in one place without testing intermediate results. Instead Clojure recommends to decompose task into a number of subtasks - this makes code much more flexible and testable. Here's the code for your task (I assume records in file describe people):
(defn read-lines [filename]
(with-open [rdr (clojure.java.io/reader filename)]
(doall (line-seq rdr))))
(defn make-person [s]
(reduce conj (map hash-map [:name :weight :value] (.split s " "))))
(map make-person (read-lines "/path/to/file"))

Related

Clojure convert vector of maps into map of vector values

There are a few SO posts related to this topic, however I could not find anything that works for what I am looking to accomplish.
I have a vector of maps. I am going to use the example from another related SO post:
(def data
[{:id 1 :first-name "John1" :last-name "Dow1" :age "14"}
{:id 2 :first-name "John2" :last-name "Dow2" :age "54"}
{:id 3 :first-name "John3" :last-name "Dow3" :age "34"}
{:id 4 :first-name "John4" :last-name "Dow4" :age "12"}
{:id 5 :first-name "John5" :last-name "Dow5" :age "24"}]))
I would like to convert this into a map with the values of each entry be a vector of the associated values (maintaining the order of data).
Here is what I would like to have as the output:
{:id [1 2 3 4 5]
:first-name ["John1" "John2" "John3" "John4" "John5"]
:last-name ["Dow1" "Dow2" "Dow3" "Dow4" "Dow5"]
:age ["14" "54" "34" "12" "24"]}
Is there an elegant and efficient way to do this in Clojure?
Can be made more efficient, but this is a nice start:
(def ks (keys (first data)))
(zipmap ks (apply map vector (map (apply juxt ks) data))) ;;=>
{:id [1 2 3 4 5]
:first-name ["John1" "John2" "John3" "John4" "John5"]
:last-name ["Dow1" "Dow2" "Dow3" "Dow4" "Dow5"]
:age ["14" "54" "34" "12" "24"]}
Another one that comes close:
(group-by key (into [] cat data))
;;=>
{:id [[:id 1] [:id 2] [:id 3] [:id 4] [:id 5]],
:first-name [[:first-name "John1"] [:first-name "John2"] [:first-name "John3"] [:first-name "John4"] [:first-name "John5"]],
:last-name [[:last-name "Dow1"] [:last-name "Dow2"] [:last-name "Dow3"] [:last-name "Dow4"] [:last-name "Dow5"]],
:age [[:age "14"] [:age "54"] [:age "34"] [:age "12"] [:age "24"]]}
Well, I worked out a solution and then before I could post, Michiel posted a more concise solution, but I'll go ahead and post it anyway =).
(defn map-coll->key-vector-map
[coll]
(reduce (fn [new-map key]
(assoc new-map key (vec (map key coll))))
{}
(keys (first coll))))
For me, the clearest approach here is the following:
(defn ->coll [x]
(cond-> x (not (coll? x)) vector))
(apply merge-with #(conj (->coll %1) %2) data)
Basically, the task here is to merge all maps (merge-with), and gather up all the values at the same key by conjoining (conj) onto the vector at key – ensuring that values are actually conjoined onto a vector (->coll).
If we concatenate the maps into a single sequence of pairs, we have an edge-list representation of a graph. What we have to do is convert it into an adjacency-list (here a vector, not a list) representation.
(defn collect [maps]
(reduce
(fn [acc [k v]] (assoc acc k (conj (get acc k []) v)))
{}
(apply concat maps)))
For example,
=> (collect data)
{:id [1 2 3 4 5]
:first-name ["John1" "John2" "John3" "John4" "John5"]
:last-name ["Dow1" "Dow2" "Dow3" "Dow4" "Dow5"]
:age ["14" "54" "34" "12" "24"]}
The advantage of this method over some of the others is that the maps in the given sequence can be different shapes.
Please consider the reader when writing code! There are no prizes for playing "code golf". There are, however, considerable costs to others when you force them to decipher code that is overly condensed.
I always try to be explicit about what code is doing. This is easiest if you break down a problem into simple steps and use good names. In particular, it is almost impossible to accomplish this using juxt or any other cryptic function.
Here is how I would implement the solution:
(def data
[{:id 1 :first-name "John1" :last-name "Dow1" :age "14"}
{:id 2 :first-name "John2" :last-name "Dow2" :age "54"}
{:id 3 :first-name "John3" :last-name "Dow3" :age "34"}
{:id 4 :first-name "John4" :last-name "Dow4" :age "12"}
{:id 5 :first-name "John5" :last-name "Dow5" :age "24"}])
(def data-keys (keys (first data)))
(defn create-empty-result
"init result map with an empty vec for each key"
[data]
(zipmap data-keys (repeat [])))
(defn append-map-to-result
[cum-map item-map]
(reduce (fn [result map-entry]
(let [[curr-key curr-val] map-entry]
(update-in result [curr-key] conj curr-val)))
cum-map
item-map))
(defn transform-data
[data]
(reduce
(fn [cum-result curr-map]
(append-map-to-result cum-result curr-map))
(create-empty-result data)
data))
with results:
(dotest
(is= (create-empty-result data)
{:id [], :first-name [], :last-name [], :age []})
(is= (append-map-to-result (create-empty-result data)
{:id 1 :first-name "John1" :last-name "Dow1" :age "14"})
{:id [1], :first-name ["John1"], :last-name ["Dow1"], :age ["14"]})
(is= (transform-data data)
{:id [1 2 3 4 5],
:first-name ["John1" "John2" "John3" "John4" "John5"],
:last-name ["Dow1" "Dow2" "Dow3" "Dow4" "Dow5"],
:age ["14" "54" "34" "12" "24"]}))
Note that I included unit tests for the helper functions as a way of both documenting what they are intended to do as well as demonstrating to the reader that they actually work as advertised.
This template project can be used to run the above code.

Map from list of maps

My problem is next, i have list of maps, for example:
({:id 1 :request-count 10 ..<another key-value pair>..}
{:id 2 :request-count 15 ..<another key-value pair>..}
...)
Need create map with records in which, key is value of 'id' and value is value of 'request-count', for each map from prev example, like:
{1 10
2 15
...}
I know how to do this. My question is - standard library have function for achieve this? Or maybe i can achiev this with combination few function, without 'reduce'?
Use the juxt function to generate a sequence of pairs, and then toss them into a map:
(into {} (map (juxt :id :request-count) data))
Example:
user=> (def data [{:id 1 :request-count 10 :abc 1}
#_=> {:id 2 :request-count 15 :def 2}
#_=> {:id 3 :request-count 20 :ghi 3}])
#'user/data
user=> (into {} (map (juxt :id :request-count) data))
{1 10, 2 15, 3 20}
Be aware that if there is more than one map in data with the same :id, then the last one encountered by map will be the one that survives in the output map.
I would do it like so:
(def data
[{:id 1 :request-count 10}
{:id 2 :request-count 15}] )
(defn make-id-req-map [map-seq]
(vec (for [curr-map map-seq]
(let [{:keys [id request-count]} curr-map]
{id request-count}))))
With result:
(make-id-req-map data) => [{1 10} {2 15}]
Note: while you could combine the map destructuring into the for statement, I usually like to label the intermediate values as described in Martin Fowler's refactoring "Introduce Explaining Variable".

Clojure - update data values in array of hashes

I would like to update values in hashes, but I'm not sure how this can be done efficiently
I tried using a loop approach, but keeping the previous record's value also in account seems like a big challenge.
This is what I am trying to do,
Considering the records are sorted based on created_at in descending order, For example,
[{:id 1, :created_at "2016-08-30 11:07:00"}{:id 2, :created_at "2016-08-30 11:05:00"}...]
]
; Basically in humanised form.
Could anyone share some ideas to achieve this? Thanks in advance.
Simplified example:
(def data [{:value 10} {:value 8} {:value 3}])
(conj
(mapv
(fn [[m1 m2]] (assoc m1 :difference (- (:value m1) (:value m2))))
(partition 2 1 data))
(last data))
;;=> [{:value 10, :difference 2} {:value 8, :difference 5} {:value 3}]
what you need, is to iterate over all the pairs of consecutive records, keeping the first of them, adding the difference to it.
first some utility functions for dates handling:
(defn parse-date [date-str]
(.parse (java.text.SimpleDateFormat. "yyyy-MM-dd HH:mm:ss") date-str))
(defn dates-diff [date-str1 date-str2]
(- (.getTime (parse-date date-str1))
(.getTime (parse-date date-str2))))
then the mapping part:
user> (def data [{:id 1, :created_at "2016-08-30 11:07:00"}
{:id 2, :created_at "2016-08-30 11:05:00"}
{:id 3, :created_at "2016-08-30 10:25:00"}])
user> (map (fn [[rec1 rec2]]
(assoc rec1 :difference
(dates-diff (:created_at rec1)
(:created_at rec2))))
(partition 2 1 data))
({:id 1, :created_at "2016-08-30 11:07:00", :difference 120000}
{:id2, :created_at "2016-08-30 11:05:00", :difference 2400000})
notice that it doesn't contain the last item, since it was never the first item of a pair. So you would have to add it manually:
user> (conj (mapv (fn [[rec1 rec2]]
(assoc rec1 :difference
(dates-diff (:created_at rec1)
(:created_at rec2))))
(partition 2 1 data))
(assoc (last data) :difference ""))
[{:id 1, :created_at "2016-08-30 11:07:00", :difference 120000}
{:id 2, :created_at "2016-08-30 11:05:00", :difference 2400000}
{:id 3, :created_at "2016-08-30 10:25:00", :difference ""}]
now it's ok. The only difference with your desired variant, is that the diff is in millis, rather than formatted string. To do that you can add the formatting to the dates-diff function.

How to group-by a collection that is already grouped by in Clojure?

I have a collection of maps
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
I want to group this first by value, followed by type.
My output should look like:
{
:orange [{
:value 3,
:id [9345, 145]
}, {
:value 2,
:id [2935]
}],
:apple [{
:value 6,
:id [2745, 2345]
}]
}
How would I do this in Clojure? Appreciate your answers.
Thanks!
Edit:
Here is what I had so far:
(defn by-type-key [data]
(group-by #(get % "type") data))
(reduce-kv
(fn [m k v] (assoc m k (reduce-kv
(fn [sm sk sv] (assoc sm sk (into [] (map #(:id %) sv))))
{}
(group-by :value (map #(dissoc % :type) v)))))
{}
(by-type-key a))
Output:
=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345], 3 [125]}}
I just couldnt figure out how to proceed next...
Your requirements are a bit inconsistent (or rather irregular) - you use :type values as keywords in the result, but the rest of the keywords are carried through. Maybe that's what you must do to satisfy some external formats - otherwise you need to either use the same approach as with :type through, or add a new keyword to the result, like :group or :rows and keep the original keywords intact. I will assume the former approach for the moment (but see below, I will get to the shape as you want it,) so the final shape of data is like
{:orange
{:3 [9345 145],
:2 [2945]},
:apple
{:6 [2745 2345]}
}
There is more than one way of getting there, here's the gist of one:
(group-by (juxt :type :value) a)
The result:
{["orange" 3] [{:id 9345, :value 3, :type "orange"} {:id 145, :value 3, :type "orange"}],
["orange" 2] [{:id 2945, :value 2, :type "orange"}],
["apple" 6] [{:id 2745, :value 6, :type "apple"} {:id 2345, :value 6, :type "apple"}]}
Now all rows in your collection are grouped by the keys you need. From this, you can go and get the shape you want, say to get to the shape above you can do
(reduce
(fn [m [k v]]
(let [ks (map (comp keyword str) k)]
(assoc-in m ks
(map :id v))))
{}
(group-by (juxt :type :value) a))
The basic idea is to get the rows grouped by the key sequence (and that's what group-by and juxt do,) and then combine reduce and assoc-in or update-in to beat the result into place.
To get exactly the shape you described:
(reduce
(fn [m [k v]]
(let [type (keyword (first k))
value (second k)
ids (map :id v)]
(update-in m [type]
#(conj % {:value value :id ids}))))
{}
(group-by (juxt :type :value) a))
It's a bit noisy, and it might be harder to see the forest for the trees - that's why I simplified the shape, to highlight the main idea. The more regular your shapes are, the shorter and more regular your functions become - so if you have control over it, try to make it simpler for you.
I would do the transform in two stages (using reduce):
the first to collect the values
the second for formating
The following code solves your problem:
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
(defn standardise [m]
(->> m
;; first stage
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{})
;; second stage
(reduce-kv (fn [out k v]
(assoc out (keyword k)
(reduce-kv (fn [out value id]
(conj out {:value value
:id id}))
[]
v)))
{})))
(standardise a)
;; => {:orange [{:value 3, :id [9345 145]}
;; {:value 2, :id [2945]}],
;; :apple [{:value 6, :id [2745 2345]}]}
the output of the first stage is:
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{}
a)
;;=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345]}}
You may wish to use the built-in function group-by. See http://clojuredocs.org/clojure.core/group-by

Building a map from a vector

So I have a vector which sort of looks like this
["John" 23 "5551234" "Sally" 34 "5556667"]
the vector contains a lot more entries like this, what I am trying to do is make a vector of maps like this:
[{:name "John" :age 23 :ph "5551234"} {:name "Sally" :age 34 :ph "5556667"}]
Is there any way to accomplish this?
(def sample ["John" 23 "5551234" "Sally" 34 "5556667" "Harry" 42 "5554242"])
Partition the input vector into records using e.g. (partition 3 sample) (each record has 3 elements) and then
Map a zipmap:
(mapv #(zipmap [:name :age :ph] %) (partition 3 sample))
; => [{:ph "5551234", :age 23, :name "John"}
; {:ph "5556667", :age 34, :name "Sally"}
; {:ph "5554242", :age 42, :name "Harry"}]
Or use for comprehension (returns a lazy sequence rather than a vector):
(for [[name age ph] (partition 3 sample)] {:name name :age age :ph ph})
; => ({:name "John", :age 23, :ph "5551234"}
{:name "Sally", :age 34, :ph "5556667"}
{:name "Harry", :age 42, :ph "5554242"})
Note key order is not defined for maps. The for comphrehension is using an array-map since the number of key-value pairs is small, and thus the keys appear in order, but this is an implementation detail. You can explicitly use array-maps if order is important but will have a performance penalty for look-ups on larger maps.
Another couple options:
(->> ["John" 23 "5551234" "Sally" 34 "5556667"]
(partition 3)
(map (fn [[name age ph]]
{:name name :age age :ph ph})))
(->> ["John" 23 "5551234" "Sally" 34 "5556667"]
(partition 3)
(map (partial interleave [:name :age :ph]))
(map (partial apply hash-map)))