Customizing tracing in clojure - clojure

How can I inject custom code into the clojure tracing library? (https://github.com/clojure/tools.trace)
The library is producing traces like this:
(see Debugging in Clojure?)
TRACE t4328: (fib 3)
TRACE t4329: | (fib 2)
TRACE t4330: | | (fib 1)
TRACE t4330: | | => 1
TRACE t4331: | | (fib 0)
TRACE t4331: | | => 0
TRACE t4329: | => 1
TRACE t4332: | (fib 1)
TRACE t4332: | => 1
TRACE t4328: => 2
I am particularly interested in:
measuring the time of an invocation
rendering of the input/output data.
redirect the output
Examples of what I would like to produce:
Measure time:
TRACE t4328: (fib 3) (100 ms)
TRACE t4329: | (fib 2) (200 ms)
TRACE t4330: | | (fib 1) (150 ms)
.....
Rendering: Custom rendering per arg/return value
TRACE t4328: (fib number) (a small number was given)
TRACE t4329: | (fib number) (an even number was returned)
TRACE t4330: | | (fib number) (attention: number is too big)
Stacktrace:
(fib number) (fib.clj line 1)
| (fib number) (fib.clj line 2)
| | (fib number) (fib.clj line ...)
output to disk:
(fib 3)
| (fib 2)
| | (fib 1)
| | => 1
I am not sure if the library was designed to allow such customizations, however since the whole lib is merely a single file (https://github.com/clojure/tools.trace/blob/master/src/main/clojure/clojure/tools/trace.clj), I don't mind to patch it directly.
A question from 2010 (clojure: adding a debug trace to every function in a namespace?) is similar, but the suggested answer uses a custom version of trace-ns.
There the custom code is injected manually:
(clojure.contrib.trace/trace (str "entering: " s))
In short: Is there a more generic way today to inject my custom code?

Related

How to convert a lazy sequence to a map?

I've got a lazy sequence that, when sent to println, shows like this:
(((red dog) (purple cat)) ((green mouse) (yellow bird)))
Note that this is the result of reading a csv and trimming all "cell" values, hence (a) the fact that it's lazy at the time I want to print it and (b) in the future innermost lists might be more than 2 strings because more columns are added.
I'm trying to juse clojure.pprint/print-table to print this in a two-column table. I'm having a pretty hard time because print-table seems to want a map with data.
Here's a repro:
;; Mock the lazy data retrieved from a csv file
(defn get-lazy-data []
(lazy-seq '('('("red" "dog") '("purple" "cat")) '('("green" "mouse") '("yellow" "bird")))))
(defn -main []
(let [data (get-lazy-data)]
(println "Starting...")
(println data)
(println "Continuing...")
(print-table data)
(println "Finished!")))
This gives an error:
Exception in thread "main" java.lang.ClassCastException: clojure.lang.Symbol cannot be cast to java.util.Map$Entry
I've tried various options:
(print-table (apply hash-map data)) gives same exception
(print-table (zipmap data)) tells me to provide another argument for the keys, but I want a solution that doesn't rely on specifying the number of columns beforehand
comprehending and adapting the answer to "Clojure printing lazy sequence", which would be a duplicate of my question were it not that both the question and answer seem so much more complex that I don't know how to translate that solution to my own scenario
Basically I know I have an XY-problem but now I want answers to both questions:
X: How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
Y: How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?
How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
The fact that your "rows" seem to be grouped in pairs is odd, assuming you want a two-column table of color/animal, so we can remove the extra grouping with mapcat identity then zipmap those pairs with the desired map keywords:
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(def de-tupled (mapcat identity my-list))
(map #(zipmap [:color :animal] %) de-tupled)
=> ({:color red, :animal dog} {:color purple, :animal cat} {:color green, :animal mouse} {:color yellow, :animal bird})
(clojure.pprint/print-table *1)
| :color | :animal |
|--------+---------|
| red | dog |
| purple | cat |
| green | mouse |
| yellow | bird |
It's not clear from the question, but it seems like you want to support an arbitrary number of "columns" which kinda precludes having fixed names for them. In that case you can do something like this:
(def my-list ;; added third mood "column"
'(((red dog happy) (purple cat sad)) ((green mouse happy) (yellow bird sad))))
(def de-tupled (apply concat my-list))
(clojure.pprint/print-table (map #(zipmap (range) %) de-tupled))
| 0 | 1 | 2 |
|--------+-------+-------|
| red | dog | happy |
| purple | cat | sad |
| green | mouse | happy |
| yellow | bird | sad |
How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(zipmap (range) my-list)
=> {0 ((red dog) (purple cat)), 1 ((green mouse) (yellow bird))}
A related point to your problem is how you print out your data. Clojure has two ways to print:
(dotest
(println ["hello" "there" "everybody"]) ; #1
(prn ["hello" "there" "everybody"])) ; #2
#1 => [hello there everybody]
#2 => ["hello" "there" "everybody"]
For strings the presence of quotes in #2 makes a huge difference in understanding what is happening. The prn function produces output that is machine-readable (like what you type in your source code). You really need that if you have strings involved in your data.
Look at the difference with symbols:
(println ['hello 'there 'everybody])
(prn ['hello 'there 'everybody])
; doesn't matter if you quote the whole form or individual symbols
(println '[hello there everybody])
(prn '[hello there everybody])
all results are the same:
[hello there everybody]
[hello there everybody]
[hello there everybody]
[hello there everybody]
The point is that you need prn to tell the difference between symbols and strings when you print results. Note that the prn output format (with double-quotes) happens automatically if you use pprint:
(def data
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]])
(clojure.pprint/pprint data) =>
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]]

Project Euler 1 too slow with triangular numbers

I was trying to solve Euler Project problem 1. I have noticed a sequence leading to a quicker solution of every 15th number.
This is the Clojure code
(defn fifteenator [n]
(* 15 (+ (* (+ 1 n) 3) (* (/ (+ (* n n) n) 2) 7))))
for 15 n is 0 for 30 n is 1 and so on.
So I can calculate the nearest number divisible by 15 and do only a few recursive calculations. But still one of the HackerRank test cases times out. Before I start profiling the code I would like to make sure if my reasoning is correct. Is there a quicker way to calculate it, or should I learn to profile Clojure?
I am not sure about your approach. Clojure has excellent support for ranges and filters. With these solving Euler 1 is not too difficult:
(defn euler1
[n]
(reduce +
(filter #(or (= (rem % 5) 0) (= (rem % 3) 0)) (range n))))
Testing if we are getting the right result:
user=> (euler1 10)
23

cond in Clojure with thousands of clauses

Running the following code in Clojure gives a StackOverflow Error:
(cond
(= 1 2) 1
(= 2 3) 2
(= 3 4) 3
...
(= 1022 1023) 1022
(= 1023 1024) 1023
:else 1024)
I would like to create a function/macro that can handle a huge number of clauses without creating a stack that overflows.
Please advise as to how I might attempt this.
If you look at the full stack trace, you'll see that cond emits a deeply-nested if structure; the exception then occurs when the compiler tries to parse this structure. The problem might have more to do with simply compiling deeply nested Clojure code than the specific use of cond.
I was able to come up with the following macro that takes a list of clauses, wraps them in thunks to provide the deferred evaluation that you get with if, and then uses some to find the first logical true test expression. Its performance probably isn't as good due to the creation of so many anonymous functions, but it gets around the stack overflow exception.
(defmacro cond' [& clauses]
`(:result
(some (fn [[pred-thunk# val-thunk#]]
(if (pred-thunk#) {:result (val-thunk#)}))
(partition 2 (list ~#(map (fn [c] `(fn [] ~c)) clauses))))))
Note the wrapping and unwrapping of the returned value in a map, to ensure that some correctly handles a value clause that evaluates to nil.
A cond with 513 clauses in unlikely to be used in practice.
Here is a functional implementation of your example.
(or (some identity (map #(if (= %1 %2) %1)
(range 1 1024)
(range 2 1025)))
1024)
The requirement is like this:
Given a list of condition and result mappings, e.g.
[ [cond1 r1] [cond2 r2] ...], where
cond1: (= 1 1),
r1: 1
Find the result - rn - where condn evaluates to true
Solution using some is perfect here in solving the problem, but I think we can avoid using macro.
(defn match [[condition result]]
(when condition result))
(some match [[(= 1 2) 100] [(= 2 3) 200] [(= 3 3) 300]]) ;; => 300

Incanter - How can I use filter with column keywords instead of nth?

(require '[incanter.core :as icore])
;; Assume dataset "data" is already loaded by incanter.core/read-dataset
;; Let's examine the columns (note that Volume is the 5th column)
(icore/col-names data)
==> [:Date :Open :High :Low :Close :Volume]
;; We CAN use the :Volume keyword to look at just that column
(icore/sel data :cols Volume)
==> (11886469 9367474 12847099 9938230 11446219 12298336 15985045...)
;; But we CANNOT use the :Volume keyword with filters
;; (well, not without looking up the position in col-names first...)
(icore/sel data :filter #(> (#{:Volume} %) 1000000))
Obviously this is because the filter's anon function is looking at a LazySeq, which no longer has the column names as part of its structure, so the above code won't even compile. My question is this: Does Incanter have a way to perform this filtered query, still allowing me to use column keywords? For example, I can get this to work because I know that :Volume is the 5th column
(icore/sel data :filter #(> (nth % 5) 1000000))
Again, though, I'm looking to see if Incanter has a way of preserving the column keyword for this type of filtered query.
Example dataset:
(def data
(icore/dataset
[:foo :bar :baz :quux]
[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]]))
Example query with result:
(icore/$where {:baz {:fn #(> % 1)}} data)
| :foo | :bar | :baz | :quux |
|------+------+------+-------|
| 2 | 2 | 2 | 2 |
Actually this could also be written
(icore/$where {:baz {:gt 1}} data)
Several such "predicate keywords" are support apart from :gt: :lt, :lte, :gte, :eq (corresponding to Clojure's =), :ne (not=), :in, :nin (not in).
:fn is the general "use any function" keyword.
All of these can be prefixed with $ (:$fn etc.) with no change in meaning.

How to improve text processing performance in Clojure?

I'm writing a simple desktop search engine in Clojure as a way to learn more about the language. Until now, the performance during the text processing phase of my program is really bad.
During the text processing I've to:
Clean up unwanted characters;
Convert the string to lowercase;
Split the document to get a list of words;
Build a map which associates each word to its occurrences in the document.
Here is the code:
(ns txt-processing.core
(:require [clojure.java.io :as cjio])
(:require [clojure.string :as cjstr])
(:gen-class))
(defn all-files [path]
(let [entries (file-seq (cjio/file path))]
(filter (memfn isFile) entries)))
(def char-val
(let [value #(Character/getNumericValue %)]
{:a (value \a) :z (value \z)
:A (value \A) :Z (value \Z)
:0 (value \0) :9 (value \9)}))
(defn is-ascii-alpha-num [c]
(let [n (Character/getNumericValue c)]
(or (and (>= n (char-val :a)) (<= n (char-val :z)))
(and (>= n (char-val :A)) (<= n (char-val :Z)))
(and (>= n (char-val :0)) (<= n (char-val :9))))))
(defn is-valid [c]
(or (is-ascii-alpha-num c)
(Character/isSpaceChar c)
(.equals (str \newline) (str c))))
(defn lower-and-replace [c]
(if (.equals (str \newline) (str c)) \space (Character/toLowerCase c)))
(defn tokenize [content]
(let [filtered (filter is-valid content)
lowered (map lower-and-replace filtered)]
(cjstr/split (apply str lowered) #"\s+")))
(defn process-content [content]
(let [words (tokenize content)]
(loop [ws words i 0 hmap (hash-map)]
(if (empty? ws)
hmap
(recur (rest ws) (+ i 1) (update-in hmap [(first ws)] #(conj % i)))))))
(defn -main [& args]
(doseq [file (all-files (first args))]
(let [content (slurp file)
oc-list (process-content content)]
(println "File:" (.getPath file)
"| Words to be indexed:" (count oc-list )))))
As I have another implementation of this problem in Haskell, I compared both as you can see in the following outputs.
Clojure version:
$ lein uberjar
Compiling txt-processing.core
Created /home/luisgabriel/projects/txt-processing/clojure/target/txt-processing-0.1.0-SNAPSHOT.jar
Including txt-processing-0.1.0-SNAPSHOT.jar
Including clojure-1.5.1.jar
Created /home/luisgabriel/projects/txt-processing/clojure/target/txt-processing-0.1.0-SNAPSHOT-standalone.jar
$ time java -jar target/txt-processing-0.1.0-SNAPSHOT-standalone.jar ../data
File: ../data/The.Rat.Racket.by.David.Henry.Keller.txt | Words to be indexed: 2033
File: ../data/Beyond.Pandora.by.Robert.J.Martin.txt | Words to be indexed: 1028
File: ../data/Bat.Wing.by.Sax.Rohmer.txt | Words to be indexed: 7562
File: ../data/Operation.Outer.Space.by.Murray.Leinster.txt | Words to be indexed: 7754
File: ../data/The.Reign.of.Mary.Tudor.by.James.Anthony.Froude.txt | Words to be indexed: 15418
File: ../data/.directory | Words to be indexed: 3
File: ../data/Home.Life.in.Colonial.Days.by.Alice.Morse.Earle.txt | Words to be indexed: 12191
File: ../data/The.Dark.Door.by.Alan.Edward.Nourse.txt | Words to be indexed: 2378
File: ../data/Storm.Over.Warlock.by.Andre.Norton.txt | Words to be indexed: 7451
File: ../data/A.Brief.History.of.the.United.States.by.John.Bach.McMaster.txt | Words to be indexed: 11049
File: ../data/The.Jesuits.in.North.America.in.the.Seventeenth.Century.by.Francis.Parkman.txt | Words to be indexed: 14721
File: ../data/Queen.Victoria.by.Lytton.Strachey.txt | Words to be indexed: 10494
File: ../data/Crime.and.Punishment.by.Fyodor.Dostoyevsky.txt | Words to be indexed: 10642
real 2m2.164s
user 2m3.868s
sys 0m0.978s
Haskell version:
$ ghc -rtsopts --make txt-processing.hs
[1 of 1] Compiling Main ( txt-processing.hs, txt-processing.o )
Linking txt-processing ...
$ time ./txt-processing ../data/ +RTS -K12m
File: ../data/The.Rat.Racket.by.David.Henry.Keller.txt | Words to be indexed: 2033
File: ../data/Beyond.Pandora.by.Robert.J.Martin.txt | Words to be indexed: 1028
File: ../data/Bat.Wing.by.Sax.Rohmer.txt | Words to be indexed: 7562
File: ../data/Operation.Outer.Space.by.Murray.Leinster.txt | Words to be indexed: 7754
File: ../data/The.Reign.of.Mary.Tudor.by.James.Anthony.Froude.txt | Words to be indexed: 15418
File: ../data/.directory | Words to be indexed: 3
File: ../data/Home.Life.in.Colonial.Days.by.Alice.Morse.Earle.txt | Words to be indexed: 12191
File: ../data/The.Dark.Door.by.Alan.Edward.Nourse.txt | Words to be indexed: 2378
File: ../data/Storm.Over.Warlock.by.Andre.Norton.txt | Words to be indexed: 7451
File: ../data/A.Brief.History.of.the.United.States.by.John.Bach.McMaster.txt | Words to be indexed: 11049
File: ../data/The.Jesuits.in.North.America.in.the.Seventeenth.Century.by.Francis.Parkman.txt | Words to be indexed: 14721
File: ../data/Queen.Victoria.by.Lytton.Strachey.txt | Words to be indexed: 10494
File: ../data/Crime.and.Punishment.by.Fyodor.Dostoyevsky.txt | Words to be indexed: 10642
real 0m9.086s
user 0m8.591s
sys 0m0.463s
I think the (string -> lazy sequence) conversion in the Clojure implementation is killing the performance. How can I improve it?
P.S: All the code and data used in these tests can be downloaded here.
Some things you could do that would probably speed this code up:
1) Instead of mapping your chars to char-val, just do direct value comparisons between the characters. This is faster for the same reason it would faster in Java.
2) You repeatedly use str to convert single-character values to full-fledged strings. Again, consider using the character values directly. Again, object creation is slow, same as in Java.
3) You should replace process-content with clojure.core/frequencies. Perhaps inspect frequencies source to see how it is faster.
4) If you must update a (hash-map) in a loop, use transient. See: http://clojuredocs.org/clojure_core/clojure.core/transient
Also note that (hash-map) returns a PersistentArrayMap, so you are creating new instances with each call to update-in - hence slow and why you should use transients.
5) This is your friend: (set! *warn-on-reflection* true) - You have quite a bit of reflection that could benefit from type hints
Reflection warning, scratch.clj:10:13 - call to isFile can't be resolved.
Reflection warning, scratch.clj:13:16 - call to getNumericValue can't be resolved.
Reflection warning, scratch.clj:19:11 - call to getNumericValue can't be resolved.
Reflection warning, scratch.clj:26:9 - call to isSpaceChar can't be resolved.
Reflection warning, scratch.clj:30:47 - call to toLowerCase can't be resolved.
Reflection warning, scratch.clj:48:24 - reference to field getPath can't be resolved.
Reflection warning, scratch.clj:48:24 - reference to field getPath can't be resolved.
Just for comparison's sake, here's a regexp based Clojure version
(defn re-index
"Returns lazy sequence of vectors of regexp matches and their start index"
[^java.util.regex.Pattern re s]
(let [m (re-matcher re s)]
((fn step []
(when (. m (find))
(cons (vector (re-groups m)(.start m)) (lazy-seq (step))))))))
(defn group-by-keep
"Returns a map of the elements of coll keyed by the result of
f on each element. The value at each key will be a vector of the
results of r on the corresponding elements."
[f r coll]
(persistent!
(reduce
(fn [ret x]
(let [k (f x)]
(assoc! ret k (conj (get ret k []) (r x)))))
(transient {}) coll)))
(defn word-indexed
[s]
(group-by-keep
(comp clojure.string/lower-case first)
second
(re-index #"\w+" s)))