I am using clojure.spec to parse a DSL. Unfortunately, the computation time for testing if something conforms with my spec seems to grow exponentially. I would like to understand why, and how to address it.
This is what the spec looks like:
(spec/def ::settings map?)
(spec/def ::header (spec/spec
(spec/cat :prefix #{:begin-example}
:label string?
:settings (spec/? ::settings))))
(def end-block [:end-example])
(spec/def ::not-end (partial not= end-block))
(spec/def ::end #{end-block})
(spec/def ::block (spec/cat
:header ::header
:data (spec/* ::not-end)
:suffix ::end))
(spec/def ::form (spec/alt :block ::block
:form any?))
(spec/def ::forms (spec/* ::form))
In order to exercise the spec, I wrote a small function that produces valid data for the spec, and a size parameter to control the size of the data:
(defn make-sample-data [size]
(transduce
(comp (take size)
cat)
conj
[]
(repeat [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9])))
(make-sample-data 1)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]
(make-sample-data 2)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9 :a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]
Now I am executing this code:
(dotimes [i 13]
(assert (time (spec/valid? ::forms (make-sample-data i)))))
which produces the following output:
"Elapsed time: 0.077095 msecs"
"Elapsed time: 0.333636 msecs"
"Elapsed time: 0.864481 msecs"
"Elapsed time: 2.198994 msecs"
"Elapsed time: 4.432004 msecs"
"Elapsed time: 9.026142 msecs"
"Elapsed time: 17.709151 msecs"
"Elapsed time: 35.201316 msecs"
"Elapsed time: 73.178516 msecs"
"Elapsed time: 138.93966 msecs"
"Elapsed time: 288.349616 msecs"
"Elapsed time: 569.471181 msecs"
"Elapsed time: 1162.944497 msecs"
It appears to me that for every step of the size parameter, the computation time doubles.
My question is: How can I modify my spec so that validation time is linear with the size of my data?
I'm guessing the performance issue comes from a combination of greedy, branching regex specs with any? predicates.
The use of any? in the s/alt :form regex branch stood out to me. I suppose spec might be evaluating each branch of the s/alt greedily/exhaustively and then backtracking, and any? matches everything including values that match your :block branch. (Notice the spec conforms the same regardless of whether :form any? branch is defined before or after :block branch.)
If you can use a more specific predicate than any? in your top-level s/alt :form branch, you should see a big improvement. I inlined the spec definitions for brevity:
(s/def ::forms
(s/*
(s/alt :block
(s/cat :header (s/spec
(s/cat :prefix #{:begin-example}
:label string?
:settings (s/? map?)))
:data (s/* #(not= % [:end-example]))
:suffix #{[:end-example]})
:form
(s/nonconforming ;; don't tag results
(s/or :keyword keyword?
:number number?)))))
(time (s/valid? ::forms (make-sample-data 1000)))
"Elapsed time: 84.637513 msecs"
=> true
Note that allowing a collection predicate (e.g. coll?, vector?) to that :form branch degrades performance just like any?. I suppose this is because the same values are matching both branches of the s/alt.
Related
Was looking at the exercises at the bottom of chapter 9 of clojure for the brave and true (in particular the last one of searching multiple engines and returning the first hit of each)
I mocked the actual search with slurp part to be this:
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
And implemented the behavior like this:
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine))) engines)]
(doall futs)
(map deref futs)))
(I know the return here is a list and the exercise asks for a vector but can just do an into for that...)
but when i run this in the REPL
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
it seems to take 2 seconds after I added the doall (since map returns a lazy seq i don't think any of the futures even start unless i consume the seq, (last futs) also seems to work) but when I use the time macro in the REPL it reports almost no time consumed even if it is taking 2 seconds:
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.189609 msecs"
("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
what is going on with the time macro here?
TL;DR: Lazy seqs don't play nice with the time macro, and your function get-first-hit-from-each returns a lazy seq. To make lazy seqs work with time, wrap them in a doall, as suggested by the documentation. See below for a fuller thought process:
The following is the definition of the time macro in clojure.core (source):
(defmacro time
"Evaluates expr and prints the time it took. Returns the value of
expr."
{:added "1.0"}
[expr]
`(let [start# (. System (nanoTime))
ret# ~expr]
(prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
ret#))
Notice how the macro saves the return value of expr in ret#, right after which it prints the elapsed time? Only after that does ret# get returned. The key here is that your function get-first-hit-from-each returns a lazy sequence (since map returns a lazy sequence):
(type (get-first-hit-from-each "gray+cat" '("google" "bing")))
;; => clojure.lang.LazySeq
As such, when you do (time (get-first-hit-from-each "gray+cat" '("google" "bing"))), what gets saved in ret# is a lazy sequence, which doesn't actually get evaluated until we try to use its value...
We can check whether a lazy sequence has been evaluated using the realized? function. So let's tweak the time macro by adding a line to check whether ret# has been evaluated, right after the elapsed time is printed:
(defmacro my-time
[expr]
`(let [start# (. System (nanoTime))
ret# ~expr]
(prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
(prn (realized? ret#)) ;; has the lazy sequence been evaluated?
ret#))
Now testing it:
(my-time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.223054 msecs"
false
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
Nope... so that's why time prints inaccurately. None of the computationally-long stuff actually gets to run before the printout is made.
To fix that and get an accurate time, we need to ensure evaluation of the lazy seq, which can be done by strategically placing a doall in a bunch of possible places, either within your function, wrapping the map:
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine))) engines)]
(doall futs)
(doall (map deref futs))))
;; => #'propeller.core/get-first-hit-from-each
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 2005.478689 msecs"
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
or within time, wrapping the function call:
(time (doall (get-first-hit-from-each "gray+cat" '("google" "bing"))))
Something weird is happening in your setup. It works as expected for me, taking 4 seconds:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine)))
engines)]
; (doall futs)
(mapv #(println (deref %)) futs)))
(dotest
(time
(get-first-hit-from-each "gray+cat" '("google" "bing"))))
with result
--------------------------------------
Clojure 1.10.2-alpha1 Java 14
--------------------------------------
Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 4001.384795 msecs"
I did not even use the doall.
Update:
I found my mistake. I had accidentally used mapv instead of map on line 15. This forces it to wait for each call to deref. If you use map here, you get a lazy seq of a lazy seq and the function ends before the timer expires (twice => 4 sec).
--------------------------------------
Clojure 1.10.2-alpha1 Java 14
--------------------------------------
Testing tst.demo.core
"Elapsed time: 0.182797 msecs"
I always recommend to use mapv instead of map. There is also a filterv available. When in doubt, force the output into a nice, eager vector using (vec ...) to rid yourself of the headaches due to laziness.
Maybe one time in a hundred you will need the features a lazy seq provides. The other times it is a problem since you cannot predict the order of execution of statements.
P.S.
See this list of documentation, including the fabulous Clojure CheatSheet.
P.P.S.
The OP is correct that, ideally, each query should have run in parallel in a separate thread (each future uses a separate thread). The problem is, again, due to lazy map behavior.
On the println at the end, each element in the lazy list from futs is not evaluated until required by the println. So, they don't even start until later, in sequence. This defeats the intended goal of parallel execution. Again, lazy-seq behavior is the cause.
The cure: make everything explicit & eager 99+ percent of the time (i.e. mapv):
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
(defn get-first-hit-from-each
[query engines]
(let [futs (mapv (fn [engine]
(future (search-for query engine)))
engines)]
(mapv #(println (deref %)) futs)))
with result:
Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 2003.770331 msecs"
I recently discovered the Specter library that provides data-structure navigation and transformation functions and is written in Clojure.
Implementing some of its API as a learning exercise seemed like a good idea. Specter implements an API taking a function and a nested structure as arguments and returns a vector of elements from the nested structure that satisfies the function like below:
(select (walker number?) [1 :a {:b 2}]) => [1 2]
Below is my attempt at implementing a function with similar API:
(defn select-walker [afn ds]
(vec (if (and (coll? ds) (not-empty ds))
(concat (select-walker afn (first ds))
(select-walker afn (rest ds)))
(if (afn ds) [ds]))))
(select-walker number? [1 :a {:b 2}]) => [1 2]
I have tried implementing select-walker by using list comprehension, looping, and using cons and conj. In all these cases
the return value was a nested list instead of a flat vector of elements.
Yet my implementation does not seem like idiomatic Clojure and has poor time and space complexity.
(time (dotimes [_ 1000] (select (walker number?) (range 100))))
"Elapsed time: 19.445396 msecs"
(time (dotimes [_ 1000] (select-walker number? (range 100))))
"Elapsed time: 237.000334 msecs"
Notice that my implementation is about 12 times slower than Specter's implementation.
I have three questions on the implemention of select-walker.
Is a tail-recursive implementaion of select-walker possible?
Can select-walker be written in more idiomatic Clojure?
Any hints to make select-walker execute faster?
there are at least two possibilities to make it tail recursive. First one is to process data in loop like this:
(defn select-walker-rec [afn ds]
(loop [res [] ds ds]
(cond (empty? ds) res
(coll? (first ds)) (recur res
(doall (concat (first ds)
(rest ds))))
(afn (first ds)) (recur (conj res (first ds)) (rest ds))
:else (recur res (rest ds)))))
in repl:
user> (select-walker-rec number? [1 :a {:b 2}])
[1 2]
user> user> (time (dotimes [_ 1000] (select-walker-rec number? (range 100))))
"Elapsed time: 19.428887 msecs"
(simple select-walker works about 200ms for me)
the second one (slower though, and more suitable for more difficult tasks) is to use zippers:
(require '[clojure.zip :as z])
(defn select-walker-z [afn ds]
(loop [res [] curr (z/zipper coll? seq nil ds)]
(cond (z/end? curr) res
(z/branch? curr) (recur res (z/next curr))
(afn (z/node curr)) (recur (conj res (z/node curr))
(z/next curr))
:else (recur res (z/next curr)))))
user> (time (dotimes [_ 1000] (select-walker-z number? (range 100))))
"Elapsed time: 219.015153 msecs"
this one is really slow, since zipper operates on more complex structures. It's great power brings unneeded overhead to this simple task.
the most idiomatic approach i guess, is to use tree-seq:
(defn select-walker-t [afn ds]
(filter #(and (not (coll? %)) (afn %))
(tree-seq coll? seq ds)))
user> (time (dotimes [_ 1000] (select-walker-t number? (range 100))))
"Elapsed time: 1.320209 msecs"
it is incredibly fast, as it produces a lazy sequence of results. In fact you should realize its data for the fair test:
user> (time (dotimes [_ 1000] (doall (select-walker-t number? (range 100)))))
"Elapsed time: 53.641014 msecs"
one more thing to notice about this variant, is that it's not tail recursive, so it would fail in case of really deeply nested structures (maybe i'm mistaken, but i guess it's about couple of thousands levels of nesting), still it's suitable for the most cases.
I'm creating unordered pairs of data elements. A comment by #Chouser on this question says that hash-sets are implemented with 32 children per node, while sorted-sets are implemented with 2 children per node. Does this mean that my pairs will take up less space if I implement them with sorted-sets rather than hash-sets (assuming that the data elements are Comparable, i.e. can be sorted)? (I doubt it matters for me in practice. I'll only have hundreds of these pairs, and lookup in a two-element data structure, even sequential lookup in a vector or list, should be fast. But I'm curious.)
When comparing explicitly looking at the first two elements of a list, to using Clojure's built in sets I don't see a significant difference when running it ten million times:
user> (defn my-lookup [key pair]
(condp = key
(first pair) true
(second pair) true false))
#'user/my-lookup
user> (time (let [data `(1 2)]
(dotimes [x 10000000] (my-lookup (rand-nth [1 2]) data ))))
"Elapsed time: 906.408176 msecs"
nil
user> (time (let [data #{1 2}]
(dotimes [x 10000000] (contains? data (rand-nth [1 2])))))
"Elapsed time: 1125.992105 msecs"
nil
Of course micro-benchmarks such as this are inherently flawed and difficult to really do well so don't try to use this to show that one is better than the other. I only intend to demonstrate that they are very similar.
If I'm doing something with unordered pairs, I usually like to use a map since that makes it easy to look up the other element. E.g., if my pair is [2 7], then I'll use {2 7, 7 2}, and I can do ({2 7, 7 2} 2), which gives me 7.
As for space, the PersistentArrayMap implementation is actually very space conscious. If you look at the source code (see previous link), you'll see that it allocates an Object[] of the exact size needed to hold all the key/value pairs. I think this is used as the default map type for all maps with no more than 8 key/value pairs.
The only catch here is that you need to be careful about duplicate keys. {2 2, 2 2} will cause an exception. You could get around this problem by doing something like this: (merge {2 2} {2 2}), i.e. (merge {a b} {b a}) where it's possible that a and b have the same value.
Here's a little snippet from my repl:
user=> (def a (array-map 1 2 3 4))
#'user/a
user=> (type a)
clojure.lang.PersistentArrayMap
user=> (.count a) ; count simply returns array.length/2 of the internal Object[]
2
Note that I called array-map explicitly above. This is related to a question I asked a while ago related to map literals and def in the repl: Why does binding affect the type of my map?
This should be a comment, but i'm too short in reputation and too eager to share information.
If you are concerned about performance clj-tuple by Zachary Tellman may be 2-3 times faster than ordinary list/vectors, as claimed here ztellman / clj-tuple.
I wasn't planning to benchmark different pair representations now, but #ArthurUlfeldt's answer and #DaoWen's led me to do so. Here are my results using criterium's bench macro. Source code is below. To summarize, as expected, there are no large differences between the seven representations I tested. However, there is a gap between times for the fastest, array-map and hash-map, and the others. This is consistent with DaoWen's and Arthur Ulfeldt's remarks.
Average execution time in seconds, in order from fastest to slowest (MacBook Pro, 2.3GHz Intel Core i7):
array-map: 5.602099
hash-map: 5.787275
vector: 6.605547
sorted-set: 6.657676
hash-set: 6.746504
list: 6.948222
Edit: I added a run of test-control below, which does only what is common to all of the different other tests. test-control took, on average, 5.571284 seconds. It appears that there is a bigger difference between the -map representations and the others than I had thought: Access to a hash-map or an array-map of two entries is essentially instantaneous (on my computer, OS, Java, etc.), whereas the other representations take about a second for 10 million iterations. Which, given that it's 10M iterations, means that those operations are still almost instantaneous. (My guess is that the fact that test-arraymap was faster than test-control is due to noise from other things happening in the background on the computer. Or it could have to do with idiosyncrasies of compilation.)
(A caveat: I forgot to mention that I'm getting a warning from criterium: "JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active." I believe this means that Leiningen is starting Java with a command line option that is geared toward the -server JIT compiler, but is being run instead with the default -client JIT compiler. So the warning is saying "you think you're running -server, but you're not, so don't expect -server behavior." Running with -server might change the times given above.)
(use 'criterium.core)
;; based on Arthur Ulfedt's answer:
(defn pairlist-contains? [key pair]
(condp = key
(first pair) true
(second pair) true
false))
(defn pairvec-contains? [key pair]
(condp = key
(pair 0) true
(pair 1) true
false))
(def ntimes 10000000)
;; Test how long it takes to do what's common to all of the other tests
(defn test-control []
(print "=============================\ntest-control:\n")
(bench
(dotimes [_ ntimes]
(def _ (rand-nth [:a :b])))))
(defn test-list []
(let [data '(:a :b)]
(print "=============================\ntest-list:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairlist-contains? (rand-nth [:a :b]) data))))))
(defn test-vec []
(let [data [:a :b]]
(print "=============================\ntest-vec:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairvec-contains? (rand-nth [:a :b]) data))))))
(defn test-hashset []
(let [data (hash-set :a :b)]
(print "=============================\ntest-hashset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-sortedset []
(let [data (sorted-set :a :b)]
(print "=============================\ntest-sortedset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-hashmap []
(let [data (hash-map :a :a :b :b)]
(print "=============================\ntest-hashmap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-arraymap []
(let [data (array-map :a :a :b :b)]
(print "=============================\ntest-arraymap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-all []
(test-control)
(test-list)
(test-vec)
(test-hashset)
(test-sortedset)
(test-hashmap)
(test-arraymap))
I would like to filter a set, something like:
(filter-set even? #{1 2 3 4 5})
; => #{2 4}
If I use clojure.core/filter I get a seq which is not a set:
(filter even? #{1 2 3 4 5})
; => (2 4)
So the best I came with is:
(set (filter even? #{1 2 3 4 5}))
But I don't like it, doesn't look optimal to go from set to list back to set. What would be the Clojurian way for this ?
UPDATE
I did the following to compare #A.Webb and #Beyamor approaches. Interestingly, both have almost identical performance, but clojure.set/select is slightly better.
(defn set-bench []
(let [big-set (set (take 1000000 (iterate (fn [x] (int (rand 1000000000))) 1)))]
(time (set (filter even? big-set))) ; "Elapsed time: 422.989 msecs"
(time (clojure.set/select even? big-set))) ; "Elapsed time: 345.287 msecs"
nil) ; don't break my REPL !
clojure.set is a handy API for common set operations.
In this case, clojure.set/select is a set-specific filter. It works by dissociating elements which don't meet the predicate from the given set.
(require 'clojure.set)
(clojure.set/select even? #{1 2 3 4 5})
; => #{2 4}
I've been using java to parse numbers, e.g.
(. Integer parseInt numberString)
Is there a more clojuriffic way that would handle both integers and floats, and return clojure numbers? I'm not especially worried about performance here, I just want to process a bunch of white space delimited numbers in a file and do something with them, in the most straightforward way possible.
So a file might have lines like:
5 10 0.0002
4 12 0.003
And I'd like to be able to transform the lines into vectors of numbers.
You can use the edn reader to parse numbers. This has the benefit of giving you floats or Bignums when needed, too.
user> (require '[clojure.edn :as edn])
nil
user> (edn/read-string "0.002")
0.0020
If you want one huge vector of numbers, you could cheat and do this:
user> (let [input "5 10 0.002\n4 12 0.003"]
(read-string (str "[" input "]")))
[5 10 0.0020 4 12 0.0030]
Kind of hacky though. Or there's re-seq:
user> (let [input "5 10 0.002\n4 12 0.003"]
(map read-string (re-seq #"[\d.]+" input)))
(5 10 0.0020 4 12 0.0030)
Or one vector per line:
user> (let [input "5 10 0.002\n4 12 0.003"]
(for [line (line-seq (java.io.BufferedReader.
(java.io.StringReader. input)))]
(vec (map read-string (re-seq #"[\d.]+" line)))))
([5 10 0.0020] [4 12 0.0030])
I'm sure there are other ways.
If you want to be safer, you can use Float/parseFloat
user=> (map #(Float/parseFloat (% 0)) (re-seq #"\d+(\.\d+)?" "1 2.2 3.5"))
(1.0 2.2 3.5)
user=>
Not sure if this is "the easiest way", but I thought it was kind of fun, so... With a reflection hack, you can access just the number-reading part of Clojure's Reader:
(let [m (.getDeclaredMethod clojure.lang.LispReader
"matchNumber"
(into-array [String]))]
(.setAccessible m true)
(defn parse-number [s]
(.invoke m clojure.lang.LispReader (into-array [s]))))
Then use like so:
user> (parse-number "123")
123
user> (parse-number "123.5")
123.5
user> (parse-number "123/2")
123/2
user> (class (parse-number "123"))
java.lang.Integer
user> (class (parse-number "123.5"))
java.lang.Double
user> (class (parse-number "123/2"))
clojure.lang.Ratio
user> (class (parse-number "123123451451245"))
java.lang.Long
user> (class (parse-number "123123451451245123514236146"))
java.math.BigInteger
user> (parse-number "0x12312345145124")
5120577133367588
user> (parse-number "12312345142as36146") ; note the "as" in the middle
nil
Notice how this does not throw the usual NumberFormatException if something goes wrong; you could add a check for nil and throw it yourself if you want.
As for performance, let's have an unscientific microbenchmark (both functions have been "warmed up"; initial runs were slower as usual):
user> (time (dotimes [_ 10000] (parse-number "1234123512435")))
"Elapsed time: 564.58196 msecs"
nil
user> (time (dotimes [_ 10000] (read-string "1234123512435")))
"Elapsed time: 561.425967 msecs"
nil
The obvious disclaimer: clojure.lang.LispReader.matchNumber is a private static method of clojure.lang.LispReader and may be changed or removed at any time.
In my opinion the best/safest way that works when you want it to for any number and fails when it isn't a number is this:
(defn parse-number
"Reads a number from a string. Returns nil if not a number."
[s]
(if (re-find #"^-?\d+\.?\d*$" s)
(read-string s)))
e.g.
(parse-number "43") ;=> 43
(parse-number "72.02") ;=> 72.02
(parse-number "009.0008") ;=> 9.008
(parse-number "-92837482734982347.00789") ;=> -9.2837482734982352E16
(parse-number "89blah") ;=> nil
(parse-number "z29") ;=> nil
(parse-number "(exploit-me)") ;=> nil
Works for ints, floats/doubles, bignums, etc. If you wanted to add support for reading other notations, simply augment the regex.
Brian Carper's suggested approach (using read-string) works nicely, but only until you try and parse zero-padded numbers like "010". Observe:
user=> (read-string "010")
8
user=> (read-string "090")
java.lang.RuntimeException: java.lang.NumberFormatException: Invalid number: 090 (NO_SOURCE_FILE:0)
This is because clojure tries to parse "090" as an octal, and 090 is not a valid octal!
Brian carper's answer is almost correct. Instead of using read-string directly from clojure's core. Use clojure.edn/read-string. It is safe and it will parse anything that you throw at it.
(ns edn-example.core
(require [clojure.edn :as edn]))
(edn/read-string "2.7"); float 2.7
(edn/read-string "2"); int 2
simple, easy and execution safe ;)
Use bigint and bigdec
(bigint "1")
(bigint "010") ; returns 10N as expected
(bigint "111111111111111111111111111111111111111111111111111")
(bigdec "11111.000000000000000000000000000000000000000000001")
Clojure's bigint will use primitives when possible, while avoiding regexps, the problem with octal literals or the limited size of the other numeric types, causing (Integer. "10000000000") to fail.
(This last thing happened to me and it was quite confusing: I wrapped it into a parse-int function, and afterwards just assumed that parse-int meant "parse a natural integer" not "parse a 32bit integer")
These are the two best and correct approaches:
Using Java interop:
(Long/parseLong "333")
(Float/parseFloat "333.33")
(Double/parseDouble "333.3333333333332")
(Integer/parseInt "-333")
(Integer/parseUnsignedInt "333")
(BigInteger. "3333333333333333333333333332")
(BigDecimal. "3.3333333333333333333333333332")
(Short/parseShort "400")
(Byte/parseByte "120")
This lets you precisely control the type you want to parse the number in, when that matters to your use case.
Using the Clojure EDN reader:
(require '[clojure.edn :as edn])
(edn/read-string "333")
Unlike using read-string from clojure.core which isn't safe to use on untrusted input, edn/read-string is safe to run on untrusted input such as user input.
This is often more convenient then the Java interop if you don't need to have specific control of the types. It can parse any number literal that Clojure can parse such as:
;; Ratios
(edn/read-string "22/7")
;; Hexadecimal
(edn/read-string "0xff")
Full list here: https://www.rubberducking.com/2019/05/clojure-for-non-clojure-programmers.html#numbers
I find solussd's answer work great for my code. Based on it, here's an enhancement with support for Scientific notation. Besides, (.trim s) is added so that extra space can be tolerated.
(defn parse-number
"Reads a number from a string. Returns nil if not a number."
[s]
(if (re-find #"^-?\d+\.?\d*([Ee]\+\d+|[Ee]-\d+|[Ee]\d+)?$" (.trim s))
(read-string s)))
e.g.
(parse-number " 4.841192E-002 ") ;=> 0.04841192
(parse-number " 4.841192e2 ") ;=> 484.1192
(parse-number " 4.841192E+003 ") ;=> 4841.192
(parse-number " 4.841192e.2 ") ;=> nil
(parse-number " 4.841192E ") ;=> nil
(def mystring "5")
(Float/parseFloat mystring)