I am trying to solving pythonchallenge problem using Clojure:
(java.lang.Math/pow 2 38)
I got
2.74877906944E11
However, I need to turn off this scientific notation. I searched clojure docs, still don't know what to do. Is there a general way to toggle on/off scientific notation in Clojure?
Thanks.
You can use format to control how results are printed.
user> (format "%.0f" (Math/pow 2 38))
"274877906944"
Also, if there is no danger of losing wanted data, you can cast to an exact type:
user> 274877906944.0
2.74877906944E11
user> (long 274877906944.0)
274877906944
There are BigInts for larger inputs
user> (bigint 27487790694400000000.0)
27487790694400000000N
Warning
You can often lose precision by using java.lang.Math/pow to compute an integer power of an integer.
For example
(bigint (java.lang.Math/pow 3 38))
; 1350851717672992000N
whereas
(int-pow 3 38)
; 1350851717672992089
It works with powers of 2 because - if you look at them in binary - there is a single 1 bit with all the rest 0s. So the hex exponent just keeps going up while the solitary 1 bit floats around in the significand. No precision is lost.
By the way, int-pow above is just repeated multiplication:
(defn int-pow [b ^long ex]
(loop [acc 1, ex ex]
(case ex
0 acc
(recur (* acc b) (dec ex)))))
This is more a comment than a solution, but doesn't fit as such.
Related
I have 2 csv files with around 22K records in file F1, and 50K records in file F2, both containing company name, and address information. I need to do a fuzzy-match on name, address, and phone. Each record in F1 needs to be fuzzy-matched against each record in F2. I have made a third file R3 which is a csv containing the rules for fuzzy-matching which column from F1 to which column with F2, with a fuzzy-tolerance-level. I am trying to do it with for loop, this way -
(for [j f1-rows
h f2-rows
r r3-rows
:while (match-row j h r)]
(merge j h))
(defn match-row [j h rules]
(every?
identity
(map (fn [rule]
(<= (fuzzy/jaccard
((keyword (first rule)) j)
((keyword (second rule)) h))
((nth rule 2))))
rules)))
f1-rows and f2-rows are collections of map. Rules is a collection of sequences containing column name from f1, f2, and the tolerance level. The code is running and functioning as expected. But my problem is, it is taking around 2 hours to execute. I read up on how transducers help improving performance by eliminating intermediate chunks, but I am not able to visualize how I would apply that in my case. Any thoughts on how I can make this better/faster ?
:while vs :when
Your use of :while in this case doesn't seem to agree with your stated problem. Your for-expression will keep going while match-row is true, and stop altogether at the first false result. :when will iterate through all the combinations and include only ones where match-row is true in the resulting lazy-seq. The difference is explained here.
For example:
(for [i (range 10)
j (range 10)
:while (= i j)]
[i j]) ;=> ([0 0])
(for [i (range 10)
j (range 10)
:when (= i j)]
[i j]) ;=> ([0 0] [1 1] [2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9])
It's really strange that your code kept running for 2 hours, because that means that during those two hours every invocation of (match-row j h r) returned true, and only the last one returned false. I would check the result again to see if it really makes sense.
How long should it take?
Let's first do some back-of-the-napkin math. If you want to compare every one of 22k records with every one of 55k records, you're gonna be doing 22k * 55k comparisons, there's no way around that.
22k * 55k = 1,210,000,000
That's a big number!
What's the cost of a comparison?
From a half-a-minute glance at wikipedia, jaccard is something about sets.
The following will do to get a ballpark estimate of the cost, though it's probably very much on the low end.
(time (clojure.set/difference (set "foo") (set "bar")))
That takes about a tenth of a millisecond on my computer.
(/ (* 22e3 55e3) ;; Number of comparisons.
10 ; milliseconds
1000 ;seconds
60 ;minutes
60) ;hours
;=> 33.611111111111114
That's 33 and a half hours. And that's with a low-end estimate of the individual cost, and not counting the fact that you want to compare name, address and phone on each one (?). So that' 33 hours if every comparison fails at the first row, and 99 hours if they all get to the last row.
Before any micro-optimizations, you need to work on the algorithm, by finding some clever way of not needing to do more than a billion comparisons. If you want help on that, you need to at least supply some sample data.
Nitpicks
The indentation of the anon fn inside match-row is confusing. I'd use an editor that indents automatically, and stick to that 99% of the time, because lisp programmers read nesting by the indentation, like python. There are some slight differences between editors/autoindenters, but they're all consistent with the nesting.
(defn match-row [j h rules]
(every?
identity
(map (fn [rule]
(<= (fuzzy/jaccard
((keyword (first rule)) j)
((keyword (second rule)) h))
((nth rule 2))))
rules)))
Also,match-row needs to be defined before it is used (which it probably is in your actual code, seeing as it compiles).
22k x 50k is over 1 billion combinations. Multiply by 3 fuzzy rules & you're at 3 billion calculations. Most of which are wasted.
The only way to speed it up is to do some pre-sorting or other pre-trimming of all the combinations. For example, only do the fuzzy calc if zipcodes are identical. If you waste time trying to match people from N.Y. and Florida, you are throwing away 99% or more of your work
I was trying to solve Euler Project problem 1. I have noticed a sequence leading to a quicker solution of every 15th number.
This is the Clojure code
(defn fifteenator [n]
(* 15 (+ (* (+ 1 n) 3) (* (/ (+ (* n n) n) 2) 7))))
for 15 n is 0 for 30 n is 1 and so on.
So I can calculate the nearest number divisible by 15 and do only a few recursive calculations. But still one of the HackerRank test cases times out. Before I start profiling the code I would like to make sure if my reasoning is correct. Is there a quicker way to calculate it, or should I learn to profile Clojure?
I am not sure about your approach. Clojure has excellent support for ranges and filters. With these solving Euler 1 is not too difficult:
(defn euler1
[n]
(reduce +
(filter #(or (= (rem % 5) 0) (= (rem % 3) 0)) (range n))))
Testing if we are getting the right result:
user=> (euler1 10)
23
I am hoping to generate all the multiples of two less than 10 using the following code
(filter #(< % 10) (iterate (partial + 2) 2))
Expected output:
(2 4 6 8)
However, for some reason repl just doesn't give any output?
But, the below code works just fine...
(filter #(< % 10) '(2 4 6 8 10 12 14 16))
I understand one is lazy sequence and one is a regular sequence. That's the reason. But how can I overcome this issue if I wish to filter all the number less than 10 from a lazy sequence...?
(iterate (partial + 2) 2)
is an infinite sequence. filter has no way to know that the number of items for which the predicate is true is finite, so it will keep going forever when you're realizing the sequence (see Mark's answer).
What you want is:
(take-while #(< % 10) (iterate (partial + 2) 2))
I think I should note that Diego Basch's answer is not fully correct in its argumentation:
filter has no way to know that the number of items for which the predicate is true is finite, so it will keep going forever
Why should filter know something about that? Actually filter works fine in this case. One can apply filter on a lazy sequence and get another lazy sequence that represent potentially infinite sequence of filtered numbers:
user> (def my-seq (iterate (partial + 2) 2)) ; REPL won't be able to print this
;; => #'user/my-seq
user> (def filtered (filter #(< % 10) my-seq)) ; filter it without problems
;; => #'user/filtered
user>
Crucial detail here is that one should never try to realize (by printing in OP's case) lazy sequence when actual sequence is not finite for sure (so that Clojure knows that).
Of course, this example is only for demonstration purposes, you should use take-while here, not filter.
I've been playing with Clojure lately and I can't get this algorithm to work:
(defn reverse-number [number reversed]
(if (= number 0)
reversed
(reverse-number (/ number 10)
(+ (rem number 10) (* reversed 10)))))
This is how I should call it (reverse-number 123 0) and the result I expect is: 321.
When I run this, the REPL just hangs.
Can someone explain me, please, what is happening, what I did wrong and how to get this function working?
Note: I know I can use string functions to reverse a number. Actually, I already did this, but I'm not interested in this solution. All I want is to make the leap to functional languages. That's why I try multiple approaches.
Using string functions:
(defn reverse-number [n]
(Integer. (clojure.string/reverse (str n))))
(reverse-number 123) ; --> 321
I don't like this version since it feels like cheating by using the string version of reverse.
You should use quot instead of /.
/ in clojure will give you a fraction so number will never be 0 (unless it's 0 from the beginning), while quot will give you "integer division".
Examples:
user=> (/ 123 10)
123/10
user=> (quot 123 10)
12
How do I write modulus syntax in programming language clojure?
For example the symbols money %= money_value.
There are two functions in Clojure you might want to try: mod and rem. They work the same for positive numbers, but differently for negative numbers. Here's an example from the docs:
(mod -10 3) ; => 2
(rem -10 3) ; => -1
Update:
If you really want to convert your code to Clojure, you need to realize that an idiomatic Clojure solution probably won't look anything like your JavaScript solution. Here's a nice solution that I think does roughly what you want:
(defn change [amount]
(zipmap [:quarters :dimes :nickels :pennies]
(reduce (fn [acc x]
(let [amt (peek acc)]
(conj (pop acc)
(quot amt x)
(rem amt x))))
[amount] [25 10 5])))
(change 142)
; => {:pennies 2, :nickels 1, :dimes 1, :quarters 5}
You can look up any of the functions you don't recognize on ClojureDocs. If you just don't understand the style, then you probably need some more experience programming with higher-order functions. I think 4Clojure is a good place to start.