How can I “unparse” a date in a specific time zone? - clojure

Using clj-time, I can parse a date and time by doing
(def timestamp (format/parse (formatters :date-time-no-ms)
"2013-06-03T23:00:00-0500"))
;=> #<DateTime 2013-06-04T04:00:00.000Z>
I can convert this back into a string by doing
(unparse (formatters :year-month-day) timestamp)
;=> "2013-06-04"
This is the year, month, and day of that moment within the UTC time zone. How can I get an unparsed version of the DateTime relative to another time zone? For example, for the example above, I want to specify the UTC–5 time zone and get a string of “2013-06-03”. I have played around with from-time-zone and to-time-zone but can’t seem to find the right combination of functions and arguments.

You'll want to use clj-time.format/with-zone:
(require '(clj-time [core :as time] [format :as timef]))
(timef/unparse (timef/with-zone (:date-time-no-ms timef/formatters)
(time/time-zone-for-id "America/Chicago"))
(time/now))
;= "2013-06-02T15:20:03-05:00"

Related

Polymorphic Schemas in Clojure

I want to create polymorphic schemas/types, and I'm curious of best practices. The following 2 examples allow me to create a Frequency schema which can repeat an event monthly by day of month, or monthly by day of week (eg, every 15th, or every first monday, respectively).
The first one uses the experimental abstract map to accomplish this, and its syntax of it is awkward (IMO). Plus being in the experimental package concerns me a bit.
The second one uses s/conditional, and this suffers from not being able to easily coerce the value of type from a string to keyword, which is useful when dealing with a REST API, or JSON. (whereas s/eq is great for this).
In the general case, is one of these, or some third option, the best practice for conveying: Type A is one of Types #{B C D ...}?
Two options:
;;OPTION 1
​
(s/defschema Frequency (field (abstract-map/abstract-map-schema
:type {})
{}))
(abstract-map/extend-schema MonthlyByDOM Frequency
[:monthly-by-dom]
{:days #{MonthDay}})
(abstract-map/extend-schema MonthlyByDOW Frequency
[:monthly-by-dow]
{:days #{WeekDay}
:weeks #{(s/enum 1 2 3 4 5)}})
;;OPTION 2
​
(s/defschema MonthlyByDOM "monthly by day of month, eg every 13th and 21st day" {:type (s/eq :monthly-by-dom)
:days #{MonthDay}})
(s/defschema MonthlyByDOW "monthly by day of week, eg first, and third friday" {:type (s/eq :monthly-by-dow)
:days #{WeekDay}
:weeks #{(s/enum 1 2 3 4 5)}})
(s/defschema Frequency (field (s/conditional #(= (s/eq :monthly-by-dom) (do (prn %) (:type %))) MonthlyByDOM
#(= :monthly-by-dow (:type %)) MonthlyByDOW)
{:default {:type :monthly-by-dom
:days #{1 11 21}}}))
Similar questions that don't quite help:
https://groups.google.com/forum/#!topic/prismatic-plumbing/lMvazYXRAQQ
Polymorphic schema validation in Clojure
Validating multiple polymorphic values using Prismatic Schema

clojure pmap - why aren't i using all the cores?

I'm attempting to use the clojure pantomime library to extract/ocr text from a large number of tif documents (among others).
My plan has been to use pmap for to apply the mapping over a sequence of input data (from a postgres database) and then update that same postgres database with the tika/tesseract OCR output. This has been working ok, however i notice in htop that many of the cores are idle at times.
Is there anyway to reconcile this, and what steps can i take to determine why this may be blocking somewhere? All processing occurs on a single tif file, and each thread is entirely mutually exclusive.
Additional info:
some tika/tesseract processes take 3 seconds, others take up to 90 seconds. Generally speaking, tika is heavily CPU bound. I have ample memory available according to htop.
postgres has no locking issues in session management, so i don't think thats holding me up.
maybe somewhere future's are waiting to deref? how to tell where?
Any tips appreciated, thanks. Code added below.
(defn parse-a-path [{:keys [row_id, file_path]}]
(try
(let [
start (System/currentTimeMillis)
mime_type (pm/mime-type-of file_path)
file_content (-> file_path (extract/parse) :text)
language (pl/detect-language file_content)
]
{:mime_type mime_type
:file_content file_content
:language language
:row_id row_id
:parse_time_in_seconds (float (/ ( - (System/currentTimeMillis) start) 100))
:record_status "doc parsed"})))
(defn fetch-all-batch []
(t/info (str "Fetching lazy seq. all rows for batch.") )
(jdbc/query (db-connection)
["select
row_id,
file_path ,
file_extension
from the_table" ]))
(defn update-a-row [{:keys [row_id, file_path, file_extension] :as all-keys}]
(let [parse-out (parse-a-path all-keys )]
(try
(doall
(jdbc/execute!
(db-connection)
["update the_table
set
record_last_updated = current_timestamp ,
file_content = ? ,
mime_type = ? ,
language = ? ,
parse_time_in_seconds = ? ,
record_status = ?
where row_id = ? "
(:file_content parse-out) ,
(:mime_type parse-out) ,
(:language parse-out) ,
(:parse_time_in_seconds parse-out) ,
(:record_status parse-out) ,
row_id ])
(t/debug (str "updated row_id " (:row_id parse-out) " (" file_extension ") "
" in " (:parse_time_in_seconds parse-out) " seconds." )))
(catch Exception _ ))))
(dorun
(pmap
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
)
pmap runs the map function in parallel on batches of (+ 2 cores), but preserves ordering. This means if you have 8 cores, a batch of 10 items will be processed, but the new batch will only be started if all 10 have finished.
You could create your own code that uses combinations of future, delay and deref, which would be good academic exercise. After that, you can throw out your code and start using the claypoole library, which has a set of abstractions that cover the majority of uses of future.
For this specific case, use their unordered pmap or pfor implementations (upmap and upfor), which do exactly the same thing pmap does but do not have ordering; new items are picked up as soon as any one item in the batch is finished.
In situations where IO is the main bottleneck, or where processing times can greatly vary between items of work, it is the best way to parallelize map or for operations.
Of course you should take care not to rely on any sort of ordering for the return values.
(require '[com.climate.claypoole :as cp])
(cp/upmap (cp/ncpus)
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
I had a similar problem some time ago. I guess that you are making the same assumptions as me:
pmap calls f in parallel. But that doesn't mean that the work is shared equally. As you said, some take 3 seconds whereas other take 90 seconds. The thread that finished in 3 seconds does NOT ask the other ones to share some of the work left to do. So the finished threads just wait iddle until the last one finishes.
you didn't describe exactly how is your data but I will assume that you are using some kind of lazy sequence, which is bad for parallel processing. If your process is CPU bounded and you can hold your entire input in memory then prefer the use of clojure.core.reducers ('map', 'filter' and specially 'fold') to the use of the lazy map, filter and others.
In my case, these tips drop the processing time from 34 to a mere 8 seconds. Hope it helps

What is the difference between the user-specified transaction temp-id and the one returned from a transaction, in Datomic?

I have the following clojure function which transacts to a Datomic database:
(defn demo-tran [term description]
(d/transact conn
[{:db/id (d/tempid :db.part/utility -10034)
:utility.tag/uuid (d/squuid)
:utility.tag/term term
:utility.tag/description description}]))
I then run this in the repl:
(demo-tran "Moo" "A bovine beast")
This succeeds and I am given back a 'transaction map':
{:db-before datomic.db.Db,
#f4c9aa60 :db-after,
datomic.db.Db #908ec69f,
:tx-data [#datom[13194139534424 50 #inst"2016-04-01T09:16:50.945-00:00" 13194139534424 true]
#datom[668503069688921 153 #uuid"56fe3c82-8dbd-4a0d-9f62-27b570cbb14c" 13194139534424 true]
#datom[668503069688921 154 "Moo" 13194139534424 true]
#datom[668503069688921 155 "A bovine beast" 13194139534424 true]],
:tempids {-9222699135738586930 668503069688921}}
I have specified the tempid for this transaction as '-10034' so I would expect to find that negative number in the :tempids map. Instead I find -9222699135738586930. This is confusing. What is going on here?
I was hoping to be able to have the demo-tran function return the new EntityID but (other than guessing the position in the :tempids map) there is no way, given my inputs, to get to this value.
As one commenter mentions (via link), you need to use resolve-tempid, as documented here and demonstrated in the day of datomic project here.
In your case this would be something like:
(let [my-tempid (d/tempid :db.part/utility -100034)
tx-result #(d/transact conn [{:db/id my-tempid
:your "transaction"}])
db-after (:db-after tx-result)
tempids (:tempids tx-result)]
(d/resolve-tempid db-after tempids my-tempid))

Different values for MD 5 hash depending on technique

I'm trying to find a good way to hash a string. This method is working fine, but the results are not consistent with this website:
(defn hash-string
"Use java interop to flexibly hash strings"
[string algo base]
(let [hashed
(doto (java.security.MessageDigest/getInstance algo)
(.reset)
(.update (.getBytes string)))]
(.toString (new java.math.BigInteger 1 (.digest hashed)) base))
)
(defn hash-md5
"Generate a md5 checksum for the given string"
[string]
(hash-string string "MD5" 16)
)
When I use this, I do indeed get hashes. The problem is I'm trying a programming exercise at advent of code and it has its own examples of string hashes which offer a 3rd result different from the above 2!
How can one do an md5 in the "standard" way that is always expected?
Your MD5 operations are correct; you're just not displaying them properly.
Since an MD5 is 32 hexadecimal characters long, you need to format the string to pad it out correctly.
In other words, simply change this expression:
(.toString (new java.math.BigInteger 1 (.digest hashed)) base))
to one that uses format:
(format "%032x" (new java.math.BigInteger 1 (.digest hashed)))))

clojure - best way to increase date by x days

I need to read in a date string in yyyyMMdd format and increase it by x amount of days - at the minute I am doing it by converting to millis and adding 1 day in mills then converting back to yyyyMMdd.
(.print
(.withZone
(DateTimeFormat/forPattern "yyyymmdd") (DateTimeZone/forID "EST"))
(+ 86400000 (.parseMillis
(.withZone (DateTimeFormat/forPattern "yyyymmdd")
(DateTimeZone/forID "EST")) "20150401")))
Is there a cleaner way to do this? clj-time library is not available to me, and I am using clojure 1.2
Since you cant't use clj-time, which is the best option in this case, I can't think of anything better than using org.joda.time as you did.
However, I would suggest rewriting your code a little bit:
there is no need for time zones here;
you could create DateTimeFormat object once and reuse it.
Here is how your function could look:
(defn add [date pattern days]
(let [fmt (DateTimeFormat/forPattern pattern)
add (* 86400000 days)]
(->> date
(.parseMillis fmt)
(+ add)
(.print fmt))))
(add "20150401" "yyyymmdd" 1) ; => "20150402"
If you don't want to work with milliseconds, you could use .parseDateTime instead of .parseMillis and .plusDays method do add days to the parsed date:
(defn add [date pattern days]
(let [fmt (DateTimeFormat/forPattern pattern)
dt (.parseDateTime fmt date)]
(.print fmt (.plusDays dt days))))