I need to read in a date string in yyyyMMdd format and increase it by x amount of days - at the minute I am doing it by converting to millis and adding 1 day in mills then converting back to yyyyMMdd.
(.print
(.withZone
(DateTimeFormat/forPattern "yyyymmdd") (DateTimeZone/forID "EST"))
(+ 86400000 (.parseMillis
(.withZone (DateTimeFormat/forPattern "yyyymmdd")
(DateTimeZone/forID "EST")) "20150401")))
Is there a cleaner way to do this? clj-time library is not available to me, and I am using clojure 1.2
Since you cant't use clj-time, which is the best option in this case, I can't think of anything better than using org.joda.time as you did.
However, I would suggest rewriting your code a little bit:
there is no need for time zones here;
you could create DateTimeFormat object once and reuse it.
Here is how your function could look:
(defn add [date pattern days]
(let [fmt (DateTimeFormat/forPattern pattern)
add (* 86400000 days)]
(->> date
(.parseMillis fmt)
(+ add)
(.print fmt))))
(add "20150401" "yyyymmdd" 1) ; => "20150402"
If you don't want to work with milliseconds, you could use .parseDateTime instead of .parseMillis and .plusDays method do add days to the parsed date:
(defn add [date pattern days]
(let [fmt (DateTimeFormat/forPattern pattern)
dt (.parseDateTime fmt date)]
(.print fmt (.plusDays dt days))))
Related
I want to create polymorphic schemas/types, and I'm curious of best practices. The following 2 examples allow me to create a Frequency schema which can repeat an event monthly by day of month, or monthly by day of week (eg, every 15th, or every first monday, respectively).
The first one uses the experimental abstract map to accomplish this, and its syntax of it is awkward (IMO). Plus being in the experimental package concerns me a bit.
The second one uses s/conditional, and this suffers from not being able to easily coerce the value of type from a string to keyword, which is useful when dealing with a REST API, or JSON. (whereas s/eq is great for this).
In the general case, is one of these, or some third option, the best practice for conveying: Type A is one of Types #{B C D ...}?
Two options:
;;OPTION 1
(s/defschema Frequency (field (abstract-map/abstract-map-schema
:type {})
{}))
(abstract-map/extend-schema MonthlyByDOM Frequency
[:monthly-by-dom]
{:days #{MonthDay}})
(abstract-map/extend-schema MonthlyByDOW Frequency
[:monthly-by-dow]
{:days #{WeekDay}
:weeks #{(s/enum 1 2 3 4 5)}})
;;OPTION 2
(s/defschema MonthlyByDOM "monthly by day of month, eg every 13th and 21st day" {:type (s/eq :monthly-by-dom)
:days #{MonthDay}})
(s/defschema MonthlyByDOW "monthly by day of week, eg first, and third friday" {:type (s/eq :monthly-by-dow)
:days #{WeekDay}
:weeks #{(s/enum 1 2 3 4 5)}})
(s/defschema Frequency (field (s/conditional #(= (s/eq :monthly-by-dom) (do (prn %) (:type %))) MonthlyByDOM
#(= :monthly-by-dow (:type %)) MonthlyByDOW)
{:default {:type :monthly-by-dom
:days #{1 11 21}}}))
Similar questions that don't quite help:
https://groups.google.com/forum/#!topic/prismatic-plumbing/lMvazYXRAQQ
Polymorphic schema validation in Clojure
Validating multiple polymorphic values using Prismatic Schema
I'm attempting to use the clojure pantomime library to extract/ocr text from a large number of tif documents (among others).
My plan has been to use pmap for to apply the mapping over a sequence of input data (from a postgres database) and then update that same postgres database with the tika/tesseract OCR output. This has been working ok, however i notice in htop that many of the cores are idle at times.
Is there anyway to reconcile this, and what steps can i take to determine why this may be blocking somewhere? All processing occurs on a single tif file, and each thread is entirely mutually exclusive.
Additional info:
some tika/tesseract processes take 3 seconds, others take up to 90 seconds. Generally speaking, tika is heavily CPU bound. I have ample memory available according to htop.
postgres has no locking issues in session management, so i don't think thats holding me up.
maybe somewhere future's are waiting to deref? how to tell where?
Any tips appreciated, thanks. Code added below.
(defn parse-a-path [{:keys [row_id, file_path]}]
(try
(let [
start (System/currentTimeMillis)
mime_type (pm/mime-type-of file_path)
file_content (-> file_path (extract/parse) :text)
language (pl/detect-language file_content)
]
{:mime_type mime_type
:file_content file_content
:language language
:row_id row_id
:parse_time_in_seconds (float (/ ( - (System/currentTimeMillis) start) 100))
:record_status "doc parsed"})))
(defn fetch-all-batch []
(t/info (str "Fetching lazy seq. all rows for batch.") )
(jdbc/query (db-connection)
["select
row_id,
file_path ,
file_extension
from the_table" ]))
(defn update-a-row [{:keys [row_id, file_path, file_extension] :as all-keys}]
(let [parse-out (parse-a-path all-keys )]
(try
(doall
(jdbc/execute!
(db-connection)
["update the_table
set
record_last_updated = current_timestamp ,
file_content = ? ,
mime_type = ? ,
language = ? ,
parse_time_in_seconds = ? ,
record_status = ?
where row_id = ? "
(:file_content parse-out) ,
(:mime_type parse-out) ,
(:language parse-out) ,
(:parse_time_in_seconds parse-out) ,
(:record_status parse-out) ,
row_id ])
(t/debug (str "updated row_id " (:row_id parse-out) " (" file_extension ") "
" in " (:parse_time_in_seconds parse-out) " seconds." )))
(catch Exception _ ))))
(dorun
(pmap
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
)
pmap runs the map function in parallel on batches of (+ 2 cores), but preserves ordering. This means if you have 8 cores, a batch of 10 items will be processed, but the new batch will only be started if all 10 have finished.
You could create your own code that uses combinations of future, delay and deref, which would be good academic exercise. After that, you can throw out your code and start using the claypoole library, which has a set of abstractions that cover the majority of uses of future.
For this specific case, use their unordered pmap or pfor implementations (upmap and upfor), which do exactly the same thing pmap does but do not have ordering; new items are picked up as soon as any one item in the batch is finished.
In situations where IO is the main bottleneck, or where processing times can greatly vary between items of work, it is the best way to parallelize map or for operations.
Of course you should take care not to rely on any sort of ordering for the return values.
(require '[com.climate.claypoole :as cp])
(cp/upmap (cp/ncpus)
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
I had a similar problem some time ago. I guess that you are making the same assumptions as me:
pmap calls f in parallel. But that doesn't mean that the work is shared equally. As you said, some take 3 seconds whereas other take 90 seconds. The thread that finished in 3 seconds does NOT ask the other ones to share some of the work left to do. So the finished threads just wait iddle until the last one finishes.
you didn't describe exactly how is your data but I will assume that you are using some kind of lazy sequence, which is bad for parallel processing. If your process is CPU bounded and you can hold your entire input in memory then prefer the use of clojure.core.reducers ('map', 'filter' and specially 'fold') to the use of the lazy map, filter and others.
In my case, these tips drop the processing time from 34 to a mere 8 seconds. Hope it helps
I'm trying to find a good way to hash a string. This method is working fine, but the results are not consistent with this website:
(defn hash-string
"Use java interop to flexibly hash strings"
[string algo base]
(let [hashed
(doto (java.security.MessageDigest/getInstance algo)
(.reset)
(.update (.getBytes string)))]
(.toString (new java.math.BigInteger 1 (.digest hashed)) base))
)
(defn hash-md5
"Generate a md5 checksum for the given string"
[string]
(hash-string string "MD5" 16)
)
When I use this, I do indeed get hashes. The problem is I'm trying a programming exercise at advent of code and it has its own examples of string hashes which offer a 3rd result different from the above 2!
How can one do an md5 in the "standard" way that is always expected?
Your MD5 operations are correct; you're just not displaying them properly.
Since an MD5 is 32 hexadecimal characters long, you need to format the string to pad it out correctly.
In other words, simply change this expression:
(.toString (new java.math.BigInteger 1 (.digest hashed)) base))
to one that uses format:
(format "%032x" (new java.math.BigInteger 1 (.digest hashed)))))
Using clj-time, I can parse a date and time by doing
(def timestamp (format/parse (formatters :date-time-no-ms)
"2013-06-03T23:00:00-0500"))
;=> #<DateTime 2013-06-04T04:00:00.000Z>
I can convert this back into a string by doing
(unparse (formatters :year-month-day) timestamp)
;=> "2013-06-04"
This is the year, month, and day of that moment within the UTC time zone. How can I get an unparsed version of the DateTime relative to another time zone? For example, for the example above, I want to specify the UTC–5 time zone and get a string of “2013-06-03”. I have played around with from-time-zone and to-time-zone but can’t seem to find the right combination of functions and arguments.
You'll want to use clj-time.format/with-zone:
(require '(clj-time [core :as time] [format :as timef]))
(timef/unparse (timef/with-zone (:date-time-no-ms timef/formatters)
(time/time-zone-for-id "America/Chicago"))
(time/now))
;= "2013-06-02T15:20:03-05:00"
I have defined the following code to allow me to set column values in a java.sql.PreparedStatement. Is this code reasonable/idiomatic? How could it be improved?
(use '(clojure.template :only [do-template]))
; (import all java types not in java.lang)
(defprotocol SetPreparedStatement
(set-prepared-statement [this prepared-statement index]))
(do-template [type-name set-name]
(extend-type type-name
SetPreparedStatement
(set-prepared-statement [this prepared-statement index]
(set-name prepared-statement index this)))
BigDecimal .setBigDecimal
Boolean .setBoolean
Byte .setByte
Date .setDate
Double .setDouble
Float .setFloat
Integer .setInt
Long .setLong
Object .setObject
Short .setShort
Time .setTime
Timestamp .setTimestamp)
; Sample use
(set-prepared-statement 42 some-prepared-statement 1)
Your example looks as close to idiomatic Clojure as I can tell :)
It could perhaps benefit from abstracting the type mapping out if you have situations where you will be creating more than one template though if your creating just this one then this looks like excellent clojure to me.