What's the easiest way to parse numbers in clojure? - clojure

I've been using java to parse numbers, e.g.
(. Integer parseInt numberString)
Is there a more clojuriffic way that would handle both integers and floats, and return clojure numbers? I'm not especially worried about performance here, I just want to process a bunch of white space delimited numbers in a file and do something with them, in the most straightforward way possible.
So a file might have lines like:
5 10 0.0002
4 12 0.003
And I'd like to be able to transform the lines into vectors of numbers.

You can use the edn reader to parse numbers. This has the benefit of giving you floats or Bignums when needed, too.
user> (require '[clojure.edn :as edn])
nil
user> (edn/read-string "0.002")
0.0020
If you want one huge vector of numbers, you could cheat and do this:
user> (let [input "5 10 0.002\n4 12 0.003"]
(read-string (str "[" input "]")))
[5 10 0.0020 4 12 0.0030]
Kind of hacky though. Or there's re-seq:
user> (let [input "5 10 0.002\n4 12 0.003"]
(map read-string (re-seq #"[\d.]+" input)))
(5 10 0.0020 4 12 0.0030)
Or one vector per line:
user> (let [input "5 10 0.002\n4 12 0.003"]
(for [line (line-seq (java.io.BufferedReader.
(java.io.StringReader. input)))]
(vec (map read-string (re-seq #"[\d.]+" line)))))
([5 10 0.0020] [4 12 0.0030])
I'm sure there are other ways.

If you want to be safer, you can use Float/parseFloat
user=> (map #(Float/parseFloat (% 0)) (re-seq #"\d+(\.\d+)?" "1 2.2 3.5"))
(1.0 2.2 3.5)
user=>

Not sure if this is "the easiest way", but I thought it was kind of fun, so... With a reflection hack, you can access just the number-reading part of Clojure's Reader:
(let [m (.getDeclaredMethod clojure.lang.LispReader
"matchNumber"
(into-array [String]))]
(.setAccessible m true)
(defn parse-number [s]
(.invoke m clojure.lang.LispReader (into-array [s]))))
Then use like so:
user> (parse-number "123")
123
user> (parse-number "123.5")
123.5
user> (parse-number "123/2")
123/2
user> (class (parse-number "123"))
java.lang.Integer
user> (class (parse-number "123.5"))
java.lang.Double
user> (class (parse-number "123/2"))
clojure.lang.Ratio
user> (class (parse-number "123123451451245"))
java.lang.Long
user> (class (parse-number "123123451451245123514236146"))
java.math.BigInteger
user> (parse-number "0x12312345145124")
5120577133367588
user> (parse-number "12312345142as36146") ; note the "as" in the middle
nil
Notice how this does not throw the usual NumberFormatException if something goes wrong; you could add a check for nil and throw it yourself if you want.
As for performance, let's have an unscientific microbenchmark (both functions have been "warmed up"; initial runs were slower as usual):
user> (time (dotimes [_ 10000] (parse-number "1234123512435")))
"Elapsed time: 564.58196 msecs"
nil
user> (time (dotimes [_ 10000] (read-string "1234123512435")))
"Elapsed time: 561.425967 msecs"
nil
The obvious disclaimer: clojure.lang.LispReader.matchNumber is a private static method of clojure.lang.LispReader and may be changed or removed at any time.

In my opinion the best/safest way that works when you want it to for any number and fails when it isn't a number is this:
(defn parse-number
"Reads a number from a string. Returns nil if not a number."
[s]
(if (re-find #"^-?\d+\.?\d*$" s)
(read-string s)))
e.g.
(parse-number "43") ;=> 43
(parse-number "72.02") ;=> 72.02
(parse-number "009.0008") ;=> 9.008
(parse-number "-92837482734982347.00789") ;=> -9.2837482734982352E16
(parse-number "89blah") ;=> nil
(parse-number "z29") ;=> nil
(parse-number "(exploit-me)") ;=> nil
Works for ints, floats/doubles, bignums, etc. If you wanted to add support for reading other notations, simply augment the regex.

Brian Carper's suggested approach (using read-string) works nicely, but only until you try and parse zero-padded numbers like "010". Observe:
user=> (read-string "010")
8
user=> (read-string "090")
java.lang.RuntimeException: java.lang.NumberFormatException: Invalid number: 090 (NO_SOURCE_FILE:0)
This is because clojure tries to parse "090" as an octal, and 090 is not a valid octal!

Brian carper's answer is almost correct. Instead of using read-string directly from clojure's core. Use clojure.edn/read-string. It is safe and it will parse anything that you throw at it.
(ns edn-example.core
(require [clojure.edn :as edn]))
(edn/read-string "2.7"); float 2.7
(edn/read-string "2"); int 2
simple, easy and execution safe ;)

Use bigint and bigdec
(bigint "1")
(bigint "010") ; returns 10N as expected
(bigint "111111111111111111111111111111111111111111111111111")
(bigdec "11111.000000000000000000000000000000000000000000001")
Clojure's bigint will use primitives when possible, while avoiding regexps, the problem with octal literals or the limited size of the other numeric types, causing (Integer. "10000000000") to fail.
(This last thing happened to me and it was quite confusing: I wrapped it into a parse-int function, and afterwards just assumed that parse-int meant "parse a natural integer" not "parse a 32bit integer")

These are the two best and correct approaches:
Using Java interop:
(Long/parseLong "333")
(Float/parseFloat "333.33")
(Double/parseDouble "333.3333333333332")
(Integer/parseInt "-333")
(Integer/parseUnsignedInt "333")
(BigInteger. "3333333333333333333333333332")
(BigDecimal. "3.3333333333333333333333333332")
(Short/parseShort "400")
(Byte/parseByte "120")
This lets you precisely control the type you want to parse the number in, when that matters to your use case.
Using the Clojure EDN reader:
(require '[clojure.edn :as edn])
(edn/read-string "333")
Unlike using read-string from clojure.core which isn't safe to use on untrusted input, edn/read-string is safe to run on untrusted input such as user input.
This is often more convenient then the Java interop if you don't need to have specific control of the types. It can parse any number literal that Clojure can parse such as:
;; Ratios
(edn/read-string "22/7")
;; Hexadecimal
(edn/read-string "0xff")
Full list here: https://www.rubberducking.com/2019/05/clojure-for-non-clojure-programmers.html#numbers

I find solussd's answer work great for my code. Based on it, here's an enhancement with support for Scientific notation. Besides, (.trim s) is added so that extra space can be tolerated.
(defn parse-number
"Reads a number from a string. Returns nil if not a number."
[s]
(if (re-find #"^-?\d+\.?\d*([Ee]\+\d+|[Ee]-\d+|[Ee]\d+)?$" (.trim s))
(read-string s)))
e.g.
(parse-number " 4.841192E-002 ") ;=> 0.04841192
(parse-number " 4.841192e2 ") ;=> 484.1192
(parse-number " 4.841192E+003 ") ;=> 4841.192
(parse-number " 4.841192e.2 ") ;=> nil
(parse-number " 4.841192E ") ;=> nil

(def mystring "5")
(Float/parseFloat mystring)

Related

How do I evaluate an edn/read list?

(def a (edn/read-string "(+ 1 3)"))
; => (+ 1 3)
How do I evaluate this resulting list?
(type (first a))
; => cljs.core/Symbol
(= (first a) '+)
; => true
I suppose more generally how would I get from symbol -> function.
And is this a normal practice in clojure? I can't seem to find anything on it. Perhaps I'm not searching with the correct terms.
You'd normally use eval. But in ClojureScript, you'd need the compiler and standard lib available at runtime. This is only possible if you are making use of self-hosted ClojureScript.
If you are in a self-hosted environment (such as Lumo, Planck, Replete, Klipse, etc.), then eval will just work:
cljs.user=> (require '[clojure.edn :as edn])
nil
cljs.user=> (def a (edn/read-string "(+ 1 3)"))
#'cljs.user/a
cljs.user=> (eval a)
4
Otherwise, you can make use of the facilities in the cljs.js namespace to access self-hosted ClojureScript:
cljs.user=> (require 'cljs.js)
nil
cljs.user=> (cljs.js/eval (cljs.js/empty-state)
a {:eval cljs.js/js-eval :context :expr} prn)
{:value 4}
Note that doing this carries some size considerations: The ClojureScript compiler will be brought with your compiled artifacts into the target environment, and you must also avoid using :advanced, ensuring that the entire cljs.core standard lib and associated metadata is available at runtime.
My answer seems to only work in Clojure, not ClojureScript. See the other answer.
I think you may be looking for resolve.
(defn my-simple-eval [expr]
; Cut the function symbol from the arguments
(let [[f & args] (edn/read-string expr)]
; Resolve f to a function then apply the supplied arguments to it
(apply (resolve f) args)))
(my-simple-eval "(+ 1 3)")
=> 4
The arguments must be bare numbers for this to work though. If you want to allow for sub-expressions, you could make it recursive:
(defn my-simple-eval-rec [expr]
(letfn [(rec [[f & args]]
(->> args
(map (fn [arg]
(if (list? arg)
(rec arg) ; Process the sub-expr
arg)))
(apply (resolve f))))]
(rec (edn/read-string expr))))
(my-simple-eval-rec "(+ 1 (+ 2 5))")
=> 8
If this doesn't suffice though, I don't know of any way other than using eval:
(def a (edn/read-string "(+ 1 3)"))
(eval a)
=> 4
or, if the data is available at the time macros are expanded, you could just wrap the call to read-string to have the data interpreted as normal:
(defmacro my-read-string [expr]
(edn/read-string expr))
(my-read-string "(+ 1 3)")
=> 4

Clojure Print the contents of a vector

In Clojure, how do you print the contents of a vector? (I imagine to the console, and usually for debugging purposes). If the answer can be generalized to any Seq that would be good.
Edit:
I should add that it should be a simple function that gives output that looks reasonable, so prints an item per line - so can be easily used for debugging purposes. I'm sure there are libraries that can do it, but using a library really does seem like overkill.
I usually use println. There are several other printing functions that you might want to try. See the "IO" section of the Clojure cheatsheet.
This isn't Java. Just print it, and it will look OK.
You can also use clojure.pprint/pprint to pretty-print it. This can be helpful with large, complex data structures.
These methods work for all of the basic Clojure data structures.
Exception: Don't print infinitely long lazy structures such as what (range) returns--for obvious reasons. For that you may need to code something special.
This works for me:
(defn pr-seq
([seq msg]
(letfn [(lineify-seq [items]
(apply str (interpose "\n" items)))]
(println (str "\n--------start--------\n"
msg "\nCOUNT: " (count seq) "\n"
(lineify-seq seq) "\n---------end---------"))))
([seq]
(pr-seq seq nil)))
Example usages:
(pr-seq [1 2 3])
(pr-seq (take 20 blobs) (str "First 20 of " (count blobs) " Blobs")))
If you want to just print out the elements of the sequence/vector you could just map println to your sequence/vector, but make sure you force map to evaluate using dorun:
(dorun (map println [1 2 3 4]))
This can be applied to sequences too:
(dorun (map println '(1 2 3 4)))
Another way you can do this with apply is to curry map with println and apply it to the sequence/vector:
(apply (partial map println) [[1 2 3 4]])
(apply (partial map println) ['(1 2 3 4)])
Another way you can do this is with doseq:
(doseq [e [1 2 3 4]]
(println e))
(doseq [e '(1 2 3 4)]
(println e))
This one at least stops the text going out too far to the right:
(defn pp
([n x]
(binding [pp/*print-right-margin* n]
(-> x clojure.pprint/pprint)))
([x]
(pp 100 x)))
It is possible to do partials of this function to alter the width.

Add days to current date

Am new to clojure, can anyone help me to understand how can I get current date in clojure and then adding days to it?
for e.g. adding 3 days to current date?
The idiomatic Clojure way is to use clj-time (see link for Leiningen/Maven install instructions), which wraps Joda time as referenced by the first answer from overthink.
user=> (use '[clj-time.core])
nil
user=> (now)
#<DateTime 2014-11-25T12:03:34.714Z>
user=> (plus (now) (days 3))
#<DateTime 2014-11-28T12:05:40.888Z>
This isn't a Clojure-specific answer, really, but I'd use Joda time.
(import 'org.joda.time.DateTime)
(let [now (DateTime/now)
later (.plusDays now 3)]
[now later])
;; [#<DateTime 2014-11-24T23:26:05.885-05:00> #<DateTime 2014-11-27T23:26:05.885-05:00>]
user> (import '[java.util Calendar])
;=> java.util.Calendar
user> (defn days-later [n]
(let [today (Calendar/getInstance)]
(doto today
(.add Calendar/DATE n)
.toString)))
#'user/days-later
user> (println "Tomorrow: " (days-later 1))
;=> Tomorrow: #inst "2014-11-26T15:36:31.901+09:00"
;=> nil
user> (println "7 Days from now: " (days-later 7))
;=> 7 Days from now: #inst "2014-12-02T15:36:44.785+09:00"
;=> nil

Good Clojure representation for unordered pairs?

I'm creating unordered pairs of data elements. A comment by #Chouser on this question says that hash-sets are implemented with 32 children per node, while sorted-sets are implemented with 2 children per node. Does this mean that my pairs will take up less space if I implement them with sorted-sets rather than hash-sets (assuming that the data elements are Comparable, i.e. can be sorted)? (I doubt it matters for me in practice. I'll only have hundreds of these pairs, and lookup in a two-element data structure, even sequential lookup in a vector or list, should be fast. But I'm curious.)
When comparing explicitly looking at the first two elements of a list, to using Clojure's built in sets I don't see a significant difference when running it ten million times:
user> (defn my-lookup [key pair]
(condp = key
(first pair) true
(second pair) true false))
#'user/my-lookup
user> (time (let [data `(1 2)]
(dotimes [x 10000000] (my-lookup (rand-nth [1 2]) data ))))
"Elapsed time: 906.408176 msecs"
nil
user> (time (let [data #{1 2}]
(dotimes [x 10000000] (contains? data (rand-nth [1 2])))))
"Elapsed time: 1125.992105 msecs"
nil
Of course micro-benchmarks such as this are inherently flawed and difficult to really do well so don't try to use this to show that one is better than the other. I only intend to demonstrate that they are very similar.
If I'm doing something with unordered pairs, I usually like to use a map since that makes it easy to look up the other element. E.g., if my pair is [2 7], then I'll use {2 7, 7 2}, and I can do ({2 7, 7 2} 2), which gives me 7.
As for space, the PersistentArrayMap implementation is actually very space conscious. If you look at the source code (see previous link), you'll see that it allocates an Object[] of the exact size needed to hold all the key/value pairs. I think this is used as the default map type for all maps with no more than 8 key/value pairs.
The only catch here is that you need to be careful about duplicate keys. {2 2, 2 2} will cause an exception. You could get around this problem by doing something like this: (merge {2 2} {2 2}), i.e. (merge {a b} {b a}) where it's possible that a and b have the same value.
Here's a little snippet from my repl:
user=> (def a (array-map 1 2 3 4))
#'user/a
user=> (type a)
clojure.lang.PersistentArrayMap
user=> (.count a) ; count simply returns array.length/2 of the internal Object[]
2
Note that I called array-map explicitly above. This is related to a question I asked a while ago related to map literals and def in the repl: Why does binding affect the type of my map?
This should be a comment, but i'm too short in reputation and too eager to share information.
If you are concerned about performance clj-tuple by Zachary Tellman may be 2-3 times faster than ordinary list/vectors, as claimed here ztellman / clj-tuple.
I wasn't planning to benchmark different pair representations now, but #ArthurUlfeldt's answer and #DaoWen's led me to do so. Here are my results using criterium's bench macro. Source code is below. To summarize, as expected, there are no large differences between the seven representations I tested. However, there is a gap between times for the fastest, array-map and hash-map, and the others. This is consistent with DaoWen's and Arthur Ulfeldt's remarks.
Average execution time in seconds, in order from fastest to slowest (MacBook Pro, 2.3GHz Intel Core i7):
array-map: 5.602099
hash-map: 5.787275
vector: 6.605547
sorted-set: 6.657676
hash-set: 6.746504
list: 6.948222
Edit: I added a run of test-control below, which does only what is common to all of the different other tests. test-control took, on average, 5.571284 seconds. It appears that there is a bigger difference between the -map representations and the others than I had thought: Access to a hash-map or an array-map of two entries is essentially instantaneous (on my computer, OS, Java, etc.), whereas the other representations take about a second for 10 million iterations. Which, given that it's 10M iterations, means that those operations are still almost instantaneous. (My guess is that the fact that test-arraymap was faster than test-control is due to noise from other things happening in the background on the computer. Or it could have to do with idiosyncrasies of compilation.)
(A caveat: I forgot to mention that I'm getting a warning from criterium: "JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active." I believe this means that Leiningen is starting Java with a command line option that is geared toward the -server JIT compiler, but is being run instead with the default -client JIT compiler. So the warning is saying "you think you're running -server, but you're not, so don't expect -server behavior." Running with -server might change the times given above.)
(use 'criterium.core)
;; based on Arthur Ulfedt's answer:
(defn pairlist-contains? [key pair]
(condp = key
(first pair) true
(second pair) true
false))
(defn pairvec-contains? [key pair]
(condp = key
(pair 0) true
(pair 1) true
false))
(def ntimes 10000000)
;; Test how long it takes to do what's common to all of the other tests
(defn test-control []
(print "=============================\ntest-control:\n")
(bench
(dotimes [_ ntimes]
(def _ (rand-nth [:a :b])))))
(defn test-list []
(let [data '(:a :b)]
(print "=============================\ntest-list:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairlist-contains? (rand-nth [:a :b]) data))))))
(defn test-vec []
(let [data [:a :b]]
(print "=============================\ntest-vec:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairvec-contains? (rand-nth [:a :b]) data))))))
(defn test-hashset []
(let [data (hash-set :a :b)]
(print "=============================\ntest-hashset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-sortedset []
(let [data (sorted-set :a :b)]
(print "=============================\ntest-sortedset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-hashmap []
(let [data (hash-map :a :a :b :b)]
(print "=============================\ntest-hashmap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-arraymap []
(let [data (array-map :a :a :b :b)]
(print "=============================\ntest-arraymap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-all []
(test-control)
(test-list)
(test-vec)
(test-hashset)
(test-sortedset)
(test-hashmap)
(test-arraymap))

How to evaluate a sequence of impure functions in Clojure?

How can I evaluate a list of (impure) functions in Clojure? For instance:
[#(println "1") #(println "2") #(println "3")]
The expected output is:
1
2
3
Is there a way to achieve this without using macros? Something like (map evaluate fns-seq), maybe?
(I need this for drawing some graphics using the Clojure.processing API.)
user> (let [fs [#(println "1") #(println "2") #(println "3")]]
(doseq [f fs] (f)))
1
2
3
nil
This will eagerly consume the whole seq, calling all functions for side effects and returning whatever the last one returns:
(reduce #(%2) nil [#(println :foo) #(println :bar)])
; => prints :foo, then :bar, then returns nil
If you want to hold onto the return values, you can use reductions instead:
(reductions #(%2) nil [#(println :foo) #(println :bar)])
; => prints :foo, then :bar, then returns (nil nil)
reductions is found in clojure.contrib.seq-utils in Clojure 1.1 and in clojure.core in current snapshots of 1.2.
Update: Note that reductions returns a lazy seq, so it's no improvement over map (NB. in map you'd want to use #(%) rather than #(%2)). I mentioned it here mostly for completeness. In fact, I posted the whole answer for completeness, because normally I'd go with the doseq approach (see Brian's answer).
(apply pcalls [#(println "1") #(println "2") #(println "3")]) does just that. Just be wary of pcalls' parallelism (therefore lack of sequentiality) and lazyness.
An old question, I know, but there's another option.
You could simply invoke the functions:
(defn generate-fns []
[#(println "1") #(println "2") #(println "3")])
(dorun (pmap (memfn invoke) (generate-fns)))
This allows you to decide in a different context how you would like to execute the functions (say, pmap, or claypoole's upmap for example)