Clojure read object from a file and extract data

Clojure read object from a file and extract data - clojure

I am very new to closure so I am not fully sure how to do this. If I have a file data.txt with the following:
[
{:name "Steve"}
{:name "Issac"}
{:name "Lucas"}
{...}
]
I want to be able to read the contents of each :name tag and do something with the return value (in this case it will be printing to the console). I looked up online and found there is a method called reader and I understand how to open a file.
The Closure syntax confuses me slightly so I am not sure how to do this.

there should be 2 possiblities:
1) raw clojure by means clojure.core/read-string
(read-string "['q 2 \"test\"]")
;; [(quote q) 2 "test"]
2) via clojure.edn/read-string
(clojure.edn/read-string "['q 2 \"test\"]")
;; ['q 2 "test"]
the 2nd one should be faster and safer (but does not eval and stuff),
but is only for edn format (this is a subset of clojure code)
the string dummy (i.e from your data.txt)
;; a string, just for demo
(def s "[{:name \"Steve\" email: \"foo#bar.com\" }
{:name \"Issac\"}
{:name \"Lucas\"}]")
the rest is plain clojure, if you have trouble with clojure maps here is the doc
(doseq [name (map :name (clojure.edn/read-string s))]
(println name))
;; Steve
;; Issac
;; Lucas
;; nil

Related

How can I filter JSON data by a given date in Clojure?

I have many JSON objects, and I am trying to filter those objects by the date. These objects are being parsed from several JSON files using Cheshire.core, meaning that the JSON objects are in a collection. The date is being passed in in the following format "YYYY-MM-DD" (eg. 2015-01-10). I have tried using the filter and contains? functions to do this, but I am having no luck so far. How can I filter these JSON objects by my chosen date?
Current Clojure code:
(def filter-by-date?
(fn [orders-data date-chosen]
(contains? (get (get orders-data :date) :date) date-chosen)))
(prn (filter (filter-by-date? orders-data "2017-12-25")))
Example JSON object:
{
"id":"05d8d404-b3f6-46d1-a0f9-dbdab7e0261f",
"date":{
"date":"2015-01-10T19:11:41.000Z"
},
"total":{
"GBP":57.45
}
}
JSON after parsing with Cheshire:
[({:id "05d8d404-b3f6-46d1-a0f9-dbdab7e0261f",
:date {:date "2015-01-10T19:11:41.000Z"},
:total {:GBP 57.45}}) ({:id "325bd04-b3f6-46d1-a0f9-dbdab7e0261f",
:date {:date "2015-02-23T10:15:14.000Z"},
:total {:GBP 32.90}})]

First, I'm going to assume you've parsed the JSON first into something like this:
(def parsed-JSON {:id "05d8d404-b3f6-46d1-a0f9-dbdab7e0261f",
:date {:date "2015-01-10T19:11:41.000Z"},
:total {:GBP 57.45}})
The main problem is the fact that the date as stored in the JSON contains time information, so you aren't going to be able to check it directly using equality.
You can get around this by using clojure.string/starts-with? to check for prefixes. I'm using s/ here as an alias for clojure.string:
(defn filter-by-date [date jsons]
(filter #(s/starts-with? (get-in % [:date :date]) date)
jsons))
You were close, but I made a few changes:
You can't use contains? like that. From the docs of contains?: Returns true if key is present in the given collection, otherwise returns false. It can't be used to check for substrings; it's used to test for the presence of a key in a collection.
Use -in postfix versions to access nested structures instead of using multiple calls. I'm using (get-in ...) here instead of (get (get ...)).
You're using (def ... (fn [])) which makes things more complicated than they need to be. This is essentially what defn does, although defn also adds some more stuff as well.
To address the new information, you can just flatten the nested sequences containing the JSONs first:
(->> nested-json-colls ; The data at the bottom of the question
(flatten)
(filter-by-date "2015-01-10"))

#!/usr/bin/env boot
(defn deps [new-deps]
(merge-env! :dependencies new-deps))
(deps '[[org.clojure/clojure "1.9.0"]
[cheshire "5.8.0"]])
(require '[cheshire.core :as json]
'[clojure.string :as str])
(def orders-data-str
"[{
\"id\":\"987654\",
\"date\":{
\"date\":\"2015-01-10T19:11:41.000Z\"
},
\"total\":{
\"GBP\":57.45
}
},
{
\"id\":\"123456\",
\"date\":{
\"date\":\"2016-01-10T19:11:41.000Z\"
},
\"total\":{
\"GBP\":23.15
}
}]")
(def orders (json/parse-string orders-data-str true))
(def ret (filter #(clojure.string/includes? (get-in % [:date :date]) "2015-01-") orders))
(println ret) ; ({:id 987654, :date {:date 2015-01-10T19:11:41.000Z}, :total {:GBP 57.45}})
You can convert the date string to Date object using any DateTime library like joda-time and then do a proper filter if required.

clj-time has functions for parsing strings and comparing date-time objects. So you could do something like:
(ns filter-by-time-example
(:require [clj-time.coerce :as tc]
[clj-time.core :as t]))
(def objs [{"id" nil
"date" {"date" "2015-01-12T19:11:41.000Z"}
"total" nil}
{"id" "05d8d404-b3f6-46d1-a0f9-dbdab7e0261f"
"date" {"date" "2015-01-10T19:11:41.000Z"}
"total" {"GBP" :57.45}}
{"id" nil
"date" {"date" "2015-01-11T19:11:41.000Z"}
"total" nil}])
(defn filter-by-day
[objs y m d]
(let [start (t/date-time y m d)
end (t/plus start (t/days 1))]
(filter #(->> (get-in % ["date" "date"])
tc/from-string
(t/within? start end)) objs)))
(clojure.pprint/pprint (filter-by-day objs 2015 1 10)) ;; Returns second obj
If you're going to repeatedly do this (e.g. for multiple days) you could parse all dates in your collection into date-time objects with
(map #(update-in % ["date" "date"] tc/from-string) objs)
and then just work with that collection to avoid repeating the parsing step.

(ns filter-by-time-example
(:require [clj-time.format :as f]
[clj-time.core :as t]
[cheshire.core :as cheshire]))
(->> json-coll
(map (fn [json] (cheshire/parse-string json true)))
(map (fn [record] (assoc record :dt-date (f/format (get-in record [:date :date])))))
(filter (fn [record] (t/after? (tf/format "2017-12-25") (:dt-date record))))
(map (fn [record] (dissoc record :dt-date))))
Maybe something like this? You might need to change the filter for your usecase but as :dt-time is now a jodo.DateTime you can leverage all the clj-time predicates.

clojure same ":or" value for all keys

I've defined a record with a bunch of fields--some of which are computed, some of which don't map directly to keys in the JSON data I'm ingesting. I'm writing a factory function for it, but I want to have sensible default/not-found values. Is there a better way that tacking on :or [field1 "" field2 "" field3 "" field4 ""...]? I could write a macro but I'd rather not if I don't have to.

There are three common idioms for implementing defaults in constructor functions.
:or destructoring
Example:
(defn make-creature [{:keys [type name], :or {type :human
name (str "unnamed-" (name type))}}]
;; ...
)
This is useful when you want to specify the defaults inline. As a bonus, it allows let style bindings in the :or map where the kvs are ordered according to the :keys vector.
Merging
Example:
(def default-creature-spec {:type :human})
(defn make-creature [spec]
(let [spec (merge default-creature-spec
spec)]
;; ....
))
This is useful when you want to define the defaults externally, generate them at runtime and/or reuse them elsewhere.
Simple or
Example:
(defn make-creature [{:keys [type name]}]
(let [type (or type :human)
name (or name (str "unnamed-" (name type)))]
;; ...
))
This is as useful as :or destructoring but only those defaults are evaluated that are actually needed, i. e. it should be used in cases where computing the default adds unwanted overhead. (I don't know why :or evaluates all defaults (as of Clojure 1.7), so this is a workaround).

If you really want the same default value for all the fields, and they really have to be different than nil, and you don't want to write them down again, then you can get the record fields by calling keys on an empty instance, and then construct a map with the default values merged with the actual values:
(defrecord MyFancyRecord [a b c d])
(def my-fancy-record-fields (keys (map->MyFancyRecord {})))
;=> (:a :b :c :d)
(def default-fancy-fields (zipmap my-fancy-record-fields (repeat "")))
(defn make-fancy-record [fields]
(map->MyFancyRecord (merge default-fancy-fields
fields)))
(make-fancy-record {})
;=> {:a "", :b "", :c "", :d ""}
(make-fancy-record {:a 1})
;=> {:a 1, :b "", :c "", :d ""}
To get the list of record fields you could also use the static method getBasis on your record class:
(def my-fancy-record-fields (map keyword (MyFancyRecord/getBasis)))
(getBasis is not part of the public records api so there are no guarantees it won't be removed in future clojure versions. Right now it's available in both clojure and clojurescript, it's usage is explained in "Clojure programming by Chas Emerick, Brian Carper, Christophe Grand" and it's also mentioned in this thread during a discussion about how to get the keys from a record. So, it's up to you to decide if it's a good idea to use it)

clojure when-let alternative for an empty array?

I'm generating json as literally as I can in clojure. My problem is that certain branches of the json are only present if given parameters are given. Here is a sample of such a condition
(defn message-for
[name uuid & [generated-uuids]]
{:message {:id (generate-uuid)
:details {:name name}
:metadata {:batch (merge {:id uuid}
(when generated-uuids (let [batches (map #(array-map :id %) generated-uuids)]
{:generatedBatches batches})))}}})
Unfortunately the when/let part is quite ugly. This same could be achieved using when-let as following but it doesn't work because my map returns [] instead of a nil.
(defn message-for
[name uuid & [generated-uuids]]
{:message {:id (generate-uuid)
:details {:name name}
:metadata {:batch (merge {:id uuid}
(when-let [batches (map #(array-map :id %) generated-uuids)]
{:generatedBatches batches}))}}})
Any ideas if I could somehow make when-let consider an empty list/array/seq as false so I could clean up my code a bit?

not-empty returns its argument if it is not empty.
When using when-let with a collection, always use not-empty
to retain the collection type
make refactoring easier
expressivenes
(when-let [batches (not-empty (map ...))]
...)
In your case I'd however prefer something like this:
...
:metadata {:batch (cond-> {:id uuid}
(seq generated-uuids)
(assoc :generatedBatches (map ...)))}
...
Notice that all three of the advantages listed above where met, without a nested let.
Also notice a new advantage
easier to extend with more conditions lateron

seq returns nil on an empty input sequence so you could do:
(when-let [batches (seq (map #(array-map :id %) generated-uuids))]
{:generatedBatches batches}))}}})

How to spit a list without double-quotes

I have the following line in my code:
(spit path (prn-str job-data))
It does it's work well execpt for one thing, every item in the list are put between double-quotes...
( ":a" ":b" ":a" )
the expected result that I'd like to have
( :a :b :a )
How to get the expected result?
Thanks in advance!

What's happening
The issue isn't that the items are being put in double quotes per se but that they're strings (as opposed to the keywords you're expecting).
prn-str, which is ultimately based on pr, prints objects "in a way that objects can be read by the reader". This means strings are printed in double-quotes - otherwise the reader wouldn't be able to tell strings from symbols, or read strings with whitespace in them. See here for more information on Clojure's reader.
println and print, on the other hand, are intended to "produce output for human consumption" and do not put strings in double-quotes. This is why you're seeing the difference in output between prn-str and println.
You can verify this with class. If you try (-> job-data first class) the answer will be either java.lang.String or clojure.lang.Keyword.
Here are some examples demonstrating the different behaviors of the printing functions when used with keywords and strings:
(def str-job-data '(":a" ":b" ":c"))
(def key-job-data '(:a :b :c))
;; `println` prints both keywords and strings without quotes
(with-out-str (println str-job-data)) ;=> "(:a :b :c)\n"
(with-out-str (println key-job-data)) ;=> "(:a :b :c)\n"
;; `prn-str` prints the strings in quotes but the keywords without quotes
(prn-str str-job-data) ;=> "(\":a\" \":b\" \":c\")\n"
(prn-str key-job-data) ;=> "(:a :b :c)\n"
How to change it
Now for possible solutions. If you were expecting job-data to contain keywords then the right fix is most likely to modify job-data. However, I can't offer much guidance here without knowing more about how job-data is produced.
If for some reason you can't modify job-data (for instance, if it's produced by code you don't control) and you want to write keywords wherever it contains keyword-like strings then something like #maxthoursie's suggestion is probably your best bet. (You could hypothetically just switch to print or println but that could have undesirable effects on how other objects are printed).
(defn keyword-string->keyword [s]
(keyword (subs s 1)))
(spit path (prn-str (map keyword-string->keyword job-data)))
If job-data might contain objects other than keyword-like strings you could apply the function only when appropriate.
(defn convert-job-data [obj]
(if (and (string? obj)
(= (.charAt obj 0) \:))
(keyword-string->keyword obj)
obj))
(spit path (prn-str (map convert-job-data job-data)))
Of course, if the file you're writing is for human consumption anyway and all this business about the reader is irrelevant you could trivially make your own println-str:
(defn println-str [& more]
(with-out-str (apply println more)))
(spit path (println-str job-data))

I'm guessing job-data is not what you expect it to be.
user=> (prn-str '(:a :b :c))
"(:a :b :c)\n"
If you do have a list with strings that looks like keywords, and you would like to convert it to keywords, you could use something like
(map (comp keyword #(subs % 1)) '(":a" ":b" ":c"))
Which skips the : of each element, and then converts it to a keyword.
user=> (prn-str (map (comp keyword #(subs % 1)) '(":a" ":b" ":c")))
"(:a :b :c)\n"

How to parse URL parameters in Clojure?

If I have the request "size=3&mean=1&sd=3&type=pdf&distr=normal" what's the idiomatic way of writing the function (defn request->map [request] ...) that takes this request and
returns a map {:size 3, :mean 1, :sd 3, :type pdf, :distr normal}
Here is my attempt (using clojure.walk and clojure.string):
(defn request-to-map
[request]
(keywordize-keys
(apply hash-map
(split request #"(&|=)"))))
I am interested in how others would solve this problem.

Using form-decode and keywordize-keys:
(use 'ring.util.codec)
(use 'clojure.walk)
(keywordize-keys (form-decode "hello=world&foo=bar"))
{:foo "bar", :hello "world"}

Assuming you want to parse HTTP request query parameters, why not use ring? ring.middleware.params contains what you want.
The function for parameter extraction goes like this:
(defn- parse-params
"Parse parameters from a string into a map."
[^String param-string encoding]
(reduce
(fn [param-map encoded-param]
(if-let [[_ key val] (re-matches #"([^=]+)=(.*)" encoded-param)]
(assoc-param param-map
(codec/url-decode key encoding)
(codec/url-decode (or val "") encoding))
param-map))
{}
(string/split param-string #"&")))

You can do this easily with a number of Java libraries. I'd be hesitant to try to roll my own parser unless I read the URI specs carefully and made sure I wasn't missing any edge cases (e.g. params appearing in the query twice with different values). This uses jetty-util:
(import '[org.eclipse.jetty.util UrlEncoded MultiMap])
(defn parse-query-string [query]
(let [params (MultiMap.)]
(UrlEncoded/decodeTo query params "UTF-8")
(into {} params)))
user> (parse-query-string "size=3&mean=1&sd=3&type=pdf&distr=normal")
{"sd" "3", "mean" "1", "distr" "normal", "type" "pdf", "size" "3"}

Can also use this library for both clojure and clojurescript: https://github.com/cemerick/url
user=> (-> "a=1&b=2&c=3" cemerick.url/query->map clojure.walk/keywordize-keys)
{:a "1", :b "2", :c "3"}

Yours looks fine. I tend to overuse regexes, so I would have solved it as
(defn request-to-keywords [req]
(into {} (for [[_ k v] (re-seq #"([^&=]+)=([^&]+)" req)]
[(keyword k) v])))
(request-to-keywords "size=1&test=3NA=G")
{:size "1", :test "3NA=G"}
Edit: try to stay away from clojure.walk though. I don't think it's officially deprecated, but it's not very well maintained. (I use it plenty too, though, so don't feel too bad).

I came across this question when constructing my own site and the answer can be a bit different, and easier, if you are passing parameters internally.
Using Secretary to handle routing: https://github.com/gf3/secretary
Parameters are automatically extracted to a map in :query-params when a route match is found. The example given in the documentation:
(defroute "/users/:id" [id query-params]
(js/console.log (str "User: " id))
(js/console.log (pr-str query-params)))
(defroute #"/users/(\d+)" [id {:keys [query-params]}]
(js/console.log (str "User: " id))
(js/console.log (pr-str query-params)))
;; In both instances...
(secretary/dispach! "/users/10?action=delete")
;; ... will log
;; User: 10
;; "{:action \"delete\"}"

You can use ring.middleware.params. Here's an example with aleph:
user=> (require '[aleph.http :as http])
user=> (defn my-handler [req] (println "params:" (:params req)))
user=> (def server (http/start-server (wrap-params my-handler)))
wrap-params creates an entry in the request object called :params. If you want the query parameters as keywords, you can use ring.middleware.keyword-params. Be sure to wrap with wrap-params first:
user=> (require '[ring.middleware.params :refer [wrap-params]])
user=> (require '[ring.middleware.keyword-params :refer [wrap-keyword-params])
user=> (def server
(http/start-server (wrap-keyword-params (wrap-params my-handler))))
However, be mindful that this includes a dependency on ring.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure read object from a file and extract data - clojure

Related

How can I filter JSON data by a given date in Clojure?

clojure same ":or" value for all keys

clojure when-let alternative for an empty array?

How to spit a list without double-quotes

How to parse URL parameters in Clojure?

Categories

Resources