I want to be able to format an entire clojure file to look nice. The nicest thing I have found is clojures pprint. It does indentation and line breaks at correct locations. However it can only read clojure litterals. Clojure files are read in as strings. read-string will only take the first parenthesis of a string. Mapping read-string over the whole sequence has a host of issues I am running into. Does someone know of an automatic way to make a clojure file look pretty? Not just indent it correctly?
You can use lein-zprint which will run the zprint library over your source files. If you are a boot user, you can use boot-fmt to process your files, which also uses zprint.
The zprint library will completely reformat your source from scratch to be as pretty as it knows how to make it. It actually tries a couple of things at each level to see which is "better", and has a number of heuristics built in to try to produce something that fits as much information as it can in the vertical space while still looking "pretty".
It knows a lot about Clojure (and Clojurescript) source code, and knows which functions need different kinds of processing as well as handling the new clojure.spec (cljs.spec) files well too.
It is almost absurdly configurable, so with a little work you can tune it to output the code the way that you want to see it. But even without any configuration, it generally does a good job making your code look nice.
Using it with lein-zprint is pretty trivial.
Place [lein-zprint "0.1.16"] into the :plugins vector of your project.clj:
:plugins [[lein-zprint "0.1.16"]]
Then, to format a source file, simply invoke lein zprint on that file:
$ lein zprint src/<project>/<file-name>.clj
Unless you tell it otherwise (which is trivial to do in your project.clj), it will rename the existing file to <file-name>.clj.old so that you have them to compare while you are trying it out.
Here is an example (obviously poorly formatted):
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map] (let [style-name
(get new-map :style :not-specified) ] (if
(= style-name :not-specified) [existing-map doc-map nil]
(let [style-map ( if (= style-name :default)
(get-default-options) (get-in existing-map [:style-map style-name]))]
(cond (nil? style-name)
[existing-map doc-map "Can't specify a style of nil!"]
style-map [(merge-deep existing-map style-map) (when doc-map
(diff-deep-doc (str doc-string " specified :style " style-name)
doc-map existing-map style-map)) nil] :else
[existing-map doc-map (str "Style '" style-name "' not found!")])))))
after formatting with $lein zprint 70 src/example/apply.clj which
formats it for 70 columns, to make it fit better for this answer:
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map]
(let [style-name (get new-map :style :not-specified)]
(if (= style-name :not-specified)
[existing-map doc-map nil]
(let [style-map (if (= style-name :default)
(get-default-options)
(get-in existing-map
[:style-map style-name]))]
(cond
(nil? style-name) [existing-map doc-map
"Can't specify a style of nil!"]
style-map
[(merge-deep existing-map style-map)
(when doc-map
(diff-deep-doc
(str doc-string " specified :style " style-name)
doc-map
existing-map
style-map)) nil]
:else [existing-map doc-map
(str "Style '" style-name "' not found!")])))))
You can use weavejester/cljfmt:
$ boot -d cljfmt:0.5.6 repl
boot.user=> (require '[cljfmt.core :as cljfmt])
nil
boot.user=> (println (cljfmt/reformat-string
#_=> "( let [x 3
#_=> y 4]
#_=> (+ (* x x
#_=> )(* y y)
#_=> ))"))
(let [x 3
y 4]
(+ (* x x) (* y y)))
nil
Check its README for the supported formatting options.
Related
In clojure, can one idiomatically obtain a function's name inside of its body, hopefully accomplishing so without introducing a new wrapper for the function's definition? can one also access the function's name inside of the body of the function's :test attribute as well?
For motivation, this can be helpful for certain logging situations, as well as for keeping the body of :test oblivious to changes to the name of the function which it is supplied for.
A short elucidation of the closest that meta gets follows; there's no this notion to supply to meta, as far as I know, in clojure.
(defn a [] (:name (meta (var a))))
Obviously it is easy to accomplish with a wrapper macro.
Edit: luckily no one so far mentioned lambda combinators.
There are 2 ways to approach your question. However, I suspect that to fully automate what you want to do, you would need to define your own custom defn replacement/wrapper.
The first thing to realize is that all functions are anonymous. When we type:
(defn hello [] (println "hi"))
we are really typing:
(def hello (fn [] (println "hi"))
we are creating a symbol hello that points to an anonymous var which in turn points to an anonymous function. However, we can give the function an "internal name" like so:
(def hello (fn fn-hello [] (println "hi")))
So now we can access the function from the outside via hello or from the inside using either hello of fn-hello symbols (please don't ever use hello in both locations or you create a lot of confusion...even though it is legal).
I frequently use the fn-hello method in (otherwise) anonymous functions since any exceptions thrown will include the fn-hello symbol which makes tracking down the source of the problem much easier (the line number of the error is often missing from the stack trace). For example when using Instaparse we need a map of anonymous transform functions like:
{
:identifier fn-identifier
:string fn-string
:integer (fn fn-integer [arg] [:integer (java.lang.Integer. arg)])
:boolean (fn fn-boolean [arg] [:boolean (java.lang.Boolean. arg)])
:namespace (fn fn-namespace [arg] [:namespace arg])
:prefix (fn fn-prefix [arg] [:prefix arg])
:organization (fn fn-organization [arg] [:organization arg])
:contact (fn fn-contact [arg] [:contact arg])
:description (fn fn-description [arg] [:description arg])
:presence (fn fn-presence [arg] [:presence arg])
:revision (fn fn-revision [& args] (prepend :revision args))
:iso-date (fn fn-iso-date [& args] [:iso-date (str/join args)])
:reference (fn fn-reference [arg] [:reference arg])
:identity (fn fn-identity [& args] (prepend :identity args))
:typedef (fn fn-typedef [& args] (prepend :typedef args))
:container (fn fn-container [& args] (prepend :container args))
:rpc (fn fn-rpc [& args] (prepend :rpc args))
:input (fn fn-input [& args] (prepend :input args))
...<snip>...
}
and giving each function the "internal name" makes debugging much, much easier. Perhaps this would be unnecessary if Clojure had better error messages, but that is a longstanding (& so far unfullfilled) wish.
You can find more details here: https://clojure.org/reference/special_forms#fn
If you read closely, it claims that (defn foo [x] ...) expands into
(def foo (fn foo [x] ...))
although you may need to experiment to see if this has already solved the use-case you are seeking. It works either way as seen in this example where we explicitly avoid the inner fn-fact name:
(def fact (fn [x] ; fn-fact omitted here
(if (zero? x)
1
(* x (fact (dec x))))))
(fact 4) => 24
This version also works:
(def fact (fn fn-fact [x]
(if (zero? x)
1
(* x (fn-fact (dec x))))))
(fact 4) => 24
(fn-fact 4) => Unable to resolve symbol: fn-fact
So we see that the "internal name" fn-fact is hidden inside the function and is invisible from the outside.
A 2nd approach, if using a macro, is to use the &form global data to access the line number from the source code. In the Tupelo library this technique is used to improve error messages for the
(defmacro dotest [& body] ; #todo README & tests
(let [test-name-sym (symbol (str "test-line-" (:line (meta &form))))]
`(clojure.test/deftest ~test-name-sym ~#body)))
This convenience macro allows the use of unit tests like:
(dotest
(is (= 3 (inc 2))))
which evalutes to
(deftest test-line-123 ; assuming this is on line 123 in source file
(is (= 3 (inc 2))))
instead of manually typing
(deftest t-addition
(is (= 3 (inc 2))))
You can access (:line (meta &form)) and other information in any macro which can make your error messages and/or Exceptions much more informative to the poor reader trying to debug a problem.
Besides the above macro wrapper example, another (more involved) example of the same technique can be seen in the Plumatic Schema library, where they wrap clojure.core/defn with an extended version.
You may also wish to view this question for clarification on how Clojure uses the "anonymous" var as an intermediary between a symbol and a function: When to use a Var instead of a function?
Suppose I have a very simple .clj file on disk with the following content:
(def a 2)
(def b 3)
(defn add-two [x y] (+ x y))
(println (add-two a b))
From the context of separate program, I would like to read the above program as a list of S-Expressions, '((def a 2) (def b 3) ... (add-two a b))).
I imagine that one way of doing this involves 1. Using slurp on (io/file file-name.clj) to produce a string containing the file's contents, 2. passing that string to a parser for Clojure code, and 3. injecting the sequence produced by the parser to a list (i.e., (into '() parsed-code)).
However, this approach seems sort of clumsy and error prone. Does anyone know of a more elegant and/or idiomatic way to read a Clojure file as a list of S-Expressions?
Update: Following up on feedback from the comments section, I've decided to try the approach I mentioned on an actual source file using aphyr's clj-antlr as follows:
=> (def file-as-string (slurp (clojure.java.io/file "src/tcl/core.clj")))
=> tcl.core=> (pprint (antlr/parser "src/grammars/Clojure.g4" file-as-string))
{:parser
{:local
#object[java.lang.ThreadLocal 0x5bfcab6 "java.lang.ThreadLocal#5bfcab6"],
:grammar
#object[org.antlr.v4.tool.Grammar 0x5b8cfcb9 "org.antlr.v4.tool.Grammar#5b8cfcb9"]},
:opts
"(ns tcl.core\n (:gen-class)\n (:require [clj-antlr.core :as antlr]))\n\n(def foo 42)\n\n(defn parse-program\n \"uses antlr grammar to \"\n [program]\n ((antlr/parser \"src/grammars/Clojure.g4\") program))\n\n\n(defn -main\n \"I don't do a whole lot ... yet.\"\n [& args]\n (println \"tlc is tcl\"))\n"}
nil
Does anyone know how to transform this output to a list of S-Expressions as originally intended? That is, how might one go about squeezing valid Clojure code/data from the result of parsing with clj-antlr?
(import '[java.io PushbackReader])
(require '[clojure.java.io :as io])
(require '[clojure.edn :as edn])
;; adapted from: http://stackoverflow.com/a/24922859/6264
(defn read-forms [file]
(let [rdr (-> file io/file io/reader PushbackReader.)
sentinel (Object.)]
(loop [forms []]
(let [form (edn/read {:eof sentinel} rdr)]
(if (= sentinel form)
forms
(recur (conj forms form)))))))
(comment
(spit "/tmp/example.clj"
"(def a 2)
(def b 3)
(defn add-two [x y] (+ x y))
(println (add-two a b))")
(read-forms "/tmp/example.clj")
;;=> [(def a 2) (def b 3) (defn add-two [x y] (+ x y)) (println (add-two a b))]
)
Do you need something like this?
(let [exprs (slurp "to_read.clj")]
;; adding braces to form a proper list
(-> (str "(" (str exprs")"))
;; read-string is potentially harmful, since it evals the string
;; there exist non-evaluating readers for clojure but I don't know
;; which one are good
(read-string)
(prn)))
all.
I want to parse big log files using Clojure.
And the structure of each line record is "UserID,Lantitude,Lontitude,Timestamp".
My implemented steps are:
----> Read log file & Get top-n user list
----> Find each top-n user's records and store in separate log file (UserID.log) .
The implement source code :
;======================================================
(defn parse-file
""
[file n]
(with-open [rdr (io/reader file)]
(println "001 begin with open ")
(let [lines (line-seq rdr)
res (parse-recur lines)
sorted
(into (sorted-map-by (fn [key1 key2]
(compare [(get res key2) key2]
[(get res key1) key1])))
res)]
(println "Statistic result : " res)
(println "Top-N User List : " sorted)
(find-write-recur lines sorted n)
)))
(defn parse-recur
""
[lines]
(loop [ls lines
res {}]
(if ls
(recur (next ls)
(update-res res (first ls)))
res)))
(defn update-res
""
[res line]
(let [params (string/split line #",")
id (if (> (count params) 1) (params 0) "0")]
(if (res id)
(update-in res [id] inc)
(assoc res id 1))))
(defn find-write-recur
"Get each users' records and store into separate log file"
[lines sorted n]
(loop [x n
sd sorted
id (first (keys sd))]
(if (and (> x 0) sd)
(do (create-write-file id
(find-recur lines id))
(recur (dec x)
(rest sd)
(nth (keys sd) 1))))))
(defn find-recur
""
[lines id]
(loop [ls lines
res []]
(if ls
(recur (next ls)
(update-vec res id (first ls)))
res)))
(defn update-vec
""
[res id line]
(let [params (string/split line #",")
id_ (if (> (count params) 1) (params 0) "0")]
(if (= id id_ )
(conj res line)
res)))
(defn create-write-file
"Create a new file and write information into the file."
([file info-lines]
(with-open [wr (io/writer (str MAIN-PATH file))]
(doseq [line info-lines] (.write wr (str line "\n")))
))
([file info-lines append?]
(with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
(doseq [line info-lines] (.write wr (str line "\n"))))
))
;======================================================
I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), and get the results:
Records-----Size-----Time----Result
1,000-------42KB-----<1s-----OK
10,000------420KB----<1s-----OK
100,000-----4.3MB----3s------OK
1,000,000---43MB-----15s-----OK
6,000,000---258MB---->20M----"OutOfMemoryError Java heap space java.lang.String.substring (String.java:1913)"
======================================================
Here is the question:
1. how can i fix the error when i try to parse big log file , like > 200MB
2. how can i optimize the function to run faster ?
3. there are logs more than 1G size , how can the function deal with it.
I am still new to Clojure, any suggestion or solution will be appreciate~
Thanks
As a direct answer to your questions; from a little Clojure experience.
The quick and dirty fix for running out of memory boils down to giving the JVM more memory. You can try adding this to your project.clj:
:jvm-opts ["-Xmx1G"] ;; or more
That will make Leiningen launch the JVM with a higher memory cap.
This kind of work is going to use a lot of memory no matter how you work it. #Vidya's suggestion ot use a library is definitely worth considering. However, there's one optimization that you can make that should help a little.
Whenever you're dealing with your (line-seq ...) object (a lazy sequence) you should make sure to maintain it as a lazy seq. Doing next on it will pull the whole thing into memory at once. Use rest instead. Take a look at the clojure site, especially the section on laziness:
(rest aseq) - returns a possibly empty seq, never nil
[snip]
a (possibly) delayed path to the remaining items, if any
You may even want to traverse the log twice--once to pull just the username from each line as a lazy-seq, again to filter out those users. This will minimize the amount of the file you're holding onto at any one time.
Making sure your function is lazy should reduce the sheer overhead that having the file as a sequence in memory creates. Whether that's enough to parse a 1G file, I don't think I can say.
You definitely don't need Cascalog or Hadoop simply to parse a file which doesn't fit into your Java heap. This SO question provides some working examples of how to process large files lazily. The main point is you need to keep the file open while you traverse the lazy seq. Here is what worked for me in a similar situation:
(defn lazy-file-lines [file]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader file))))
You can map, reduce, count, etc. over this lazy sequence:
(count (lazy-file-lines "/tmp/massive-file.txt"))
;=> <a large integer>
The parsing is a separate, simpler problem.
I am also relatively new to Clojure, so there are no obvious optimizations I can see. Hopefully others more experienced can offer some advice. But I feel like this is simply a matter of the data size being too big for the tools at hand.
For that reason, I would suggest using Cascalog, an abstraction over Hadoop or your local machine using Clojure. I think the syntax for querying big log files would be pretty straightforward for you.
I have the following code:
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(map remove-folder children)
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
My problem is that the line (map remove-folder children) doesn't seem to work. In my output I expect to travel down though the folderstructure but it seems to stay in the first level.
I'll guess I have made some stupid misstake but I spent a few hours on it now and doesn't seem to get closer a solution.
map is lazy, wrap it in a doall to force evaluation, like this: (doall (map remove-folder children)).
As #ponzao said, map is lazy.
But anyway I don't think it's a good idea to use map just to get side effects and "drop" the result. Personally I would use doseq instead, because it gives a hint that I'm expecting side effects.
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(doseq [c children]
(remove-folder c))
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
Why don't you inline children and get rid of def? It doesn't seem that you want to create a global binding in your current namespace. You are unintentionally introducing a side effect.
The function name remove-folder does not tell us what the function is supposed to do. The argument name file does not describe what kind of argument can be expected. But I understand that it's kind of exploratory code.
Cheers -
How to make clojure to count '() as nil?
For example:
How to make something like
(if '() :true :false)
;to be
:false
;Or easier
(my-fun/macro/namespace/... (if '() :true :false))
:false
And not just if. In every way.
(= nil '()) or (my-something (= nil '()))
true
And every code to be (= '() nil) save.
(something (+ 1 (if (= nil '()) 1 2)))
2
I was thinking about some kind of regural expression. Which will look on code and replace '() by nil, but there are some things like (rest '(1)) and many others which are '() and I am not sure how to handle it.
I was told that macros allow you to build your own languages. I want to try it by changing clojure. So this is much about "How clojure works and how to change it?" than "I really need it to for my work."
Thank you for help.
'() just isn't the same thing as nil - why would you want it do be?
What you might be looking for though is the seq function, which returns nil if given an empty collection:
(seq [1 2 3])
=> (1 2 3)
(seq [])
=> nil
(seq '())
=> nil
seq is therefore often used to test for "emptiness", with idioms like:
(if (seq coll)
(do-something-with coll)
(get-empty-result))
You say you would like to change Clojure using the macros. Presently, as far as I know, this is not something you could do with the "regular" macro system (terminology fix anyone?). What you would really need (I think) is a reader macro. Things I have seen online (here, for example) seem to say that there exists something like reader macros in Clojure 1.4--but I have no familiarity with this because I really like using clooj as my IDE, and it currently is not using Clojure 1.4. Maybe somebody else has better info on this "extensible reader" magic.
Regardless, I don't really like the idea of changing the language in that way, and I think there is a potentially very good alternative: namely, the Clojure function not-empty.
This function takes any collection and either returns that collection as is, or returns nil if that collection is empty. This means that anywhere you will want () to return nil, you should wrap it not-empty. This answer is very similar to mikera's answer above, except that you don't have to convert your collections to sequences (which can be nice).
Both using seq and not-empty are pretty silly in cases where you have a "hand-written" collection. After all, if you are writing it by hand (or rather, typing it manually), then you are going to know for sure whether or not it is empty. The cases in which this is useful is when you have an expression or a symbol that returns a collection, and you do not know whether the returned collection will be empty or not.
Example:
=> (if-let [c (not-empty (take (rand-int 5) [:a :b :c :d]))]
(println c)
(println "Twas empty"))
;//80% of the time, this will print some non-empty sub-list of [:a :b :c :d]
;//The other 20% of the time, this will return...
Twas empty
=> nil
What about empty? ? It's the most expressive.
(if (empty? '())
:true
:false)
You can override macros and functions. For instance:
(defn classic-lisp [arg]
(if (seq? arg) (seq arg) arg))
(defn = [& args]
(apply clojure.core/= (map classic-lisp args)))
(defmacro when [cond & args]
`(when (classic-lisp ~cond) ~#args))
Unfortunately, you can't override if, as it is a special form and not a macro. You will have to wrap your code with another macro.
Let's make an if* macro to be an if with common-lisp behavior:
(defmacro if* [cond & args]
`(if (classic-lisp ~cond) ~#args)
With this, we can replace all ifs with if*s:
(use 'clojure.walk)
(defn replace-ifs [code]
(postwalk-replace '{if if*} (macroexpand-all code)))
(defmacro clojure-the-old-way [& body]
`(do ~#(map replace-ifs body)))
Now:
=> (clojure-the-old-way (if '() :true :false) )
:false
You should be able to load files and replace ifs in them too:
(defn read-clj-file [filename]
;; loads list of clojure expressions from file *filename*
(read-string (str "(" (slurp filename) ")")))
(defn load-clj-file-the-old-way [filename]
(doseq [line (replace-ifs (read-clj-file filename))] (eval line))
Note that I didn't test the code to load files and it might be incompatible with leiningen or namespaces. I believe it should work with overriden = though.