Clojure, console: println output not always visible - clojure

Subj. There's a working program, which basically copies filesystem trees recursively. Somehow println from inside the recursive function won't show any output.
build-album calls traverse-dir; I can see the "10" in the console, but never any "11"s -- should be a lot of them. (println "11") can't possibly miss the path of execution, since files get really copied (the line above). This is not quite nice, since the project is meant as a console application, reporting to the user each copied file, lest he should suspect freezing. This is no joke, because the app is intended to upload albums to mobile phones.
(defn traverse-dir
"Traverses the (source) directory, preorder"
[src-dir dst-step]
(let [{:keys [options arguments]} *parsed-args*
dst-root (arguments 1)
[dirs files] (list-dir-groomed (fs/list-dir src-dir))
dir-handler (fn [dir-obj]
"Processes the current directory, source side;
creates properly named directory destination side, if necessary"
(let [dir (.getPath dir-obj)
step (str dst-step *nix-sep* (fs/base-name dir-obj))]
(fs/mkdir (str dst-root step))
(traverse-dir dir step)))
file-handler (fn [file-obj]
"Copies the current file, properly named and tagged"
(let [dst-path (str dst-root dst-step *nix-sep* (.getName file-obj))]
(fs/copy file-obj (fs/file dst-path))
(println "11")
dst-path))]
(concat (map dir-handler dirs) (map file-handler files))))
(defn build-album
"Copy source files to destination according
to command line options"
[]
(let [{:keys [options arguments]} *parsed-args*
output (traverse-dir (arguments 0) "")]
(println "10")
output))

Might be the problem with lazy sequences: you build a lazy seq which is never realized and thus the code never executes. Try calling doall on the result of traverse-dir:
(doall (concat (map dir-handler dirs) (map file-handler files))))

Related

How to pprint a clojure file?

I want to be able to format an entire clojure file to look nice. The nicest thing I have found is clojures pprint. It does indentation and line breaks at correct locations. However it can only read clojure litterals. Clojure files are read in as strings. read-string will only take the first parenthesis of a string. Mapping read-string over the whole sequence has a host of issues I am running into. Does someone know of an automatic way to make a clojure file look pretty? Not just indent it correctly?
You can use lein-zprint which will run the zprint library over your source files. If you are a boot user, you can use boot-fmt to process your files, which also uses zprint.
The zprint library will completely reformat your source from scratch to be as pretty as it knows how to make it. It actually tries a couple of things at each level to see which is "better", and has a number of heuristics built in to try to produce something that fits as much information as it can in the vertical space while still looking "pretty".
It knows a lot about Clojure (and Clojurescript) source code, and knows which functions need different kinds of processing as well as handling the new clojure.spec (cljs.spec) files well too.
It is almost absurdly configurable, so with a little work you can tune it to output the code the way that you want to see it. But even without any configuration, it generally does a good job making your code look nice.
Using it with lein-zprint is pretty trivial.
Place [lein-zprint "0.1.16"] into the :plugins vector of your project.clj:
:plugins [[lein-zprint "0.1.16"]]
Then, to format a source file, simply invoke lein zprint on that file:
$ lein zprint src/<project>/<file-name>.clj
Unless you tell it otherwise (which is trivial to do in your project.clj), it will rename the existing file to <file-name>.clj.old so that you have them to compare while you are trying it out.
Here is an example (obviously poorly formatted):
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map] (let [style-name
(get new-map :style :not-specified) ] (if
(= style-name :not-specified) [existing-map doc-map nil]
(let [style-map ( if (= style-name :default)
(get-default-options) (get-in existing-map [:style-map style-name]))]
(cond (nil? style-name)
[existing-map doc-map "Can't specify a style of nil!"]
style-map [(merge-deep existing-map style-map) (when doc-map
(diff-deep-doc (str doc-string " specified :style " style-name)
doc-map existing-map style-map)) nil] :else
[existing-map doc-map (str "Style '" style-name "' not found!")])))))
after formatting with $lein zprint 70 src/example/apply.clj which
formats it for 70 columns, to make it fit better for this answer:
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map]
(let [style-name (get new-map :style :not-specified)]
(if (= style-name :not-specified)
[existing-map doc-map nil]
(let [style-map (if (= style-name :default)
(get-default-options)
(get-in existing-map
[:style-map style-name]))]
(cond
(nil? style-name) [existing-map doc-map
"Can't specify a style of nil!"]
style-map
[(merge-deep existing-map style-map)
(when doc-map
(diff-deep-doc
(str doc-string " specified :style " style-name)
doc-map
existing-map
style-map)) nil]
:else [existing-map doc-map
(str "Style '" style-name "' not found!")])))))
You can use weavejester/cljfmt:
$ boot -d cljfmt:0.5.6 repl
boot.user=> (require '[cljfmt.core :as cljfmt])
nil
boot.user=> (println (cljfmt/reformat-string
#_=> "( let [x 3
#_=> y 4]
#_=> (+ (* x x
#_=> )(* y y)
#_=> ))"))
(let [x 3
y 4]
(+ (* x x) (* y y)))
nil
Check its README for the supported formatting options.

Clojure : Get "OutOfMemoryError Java heap space" when parsing big log file

all.
I want to parse big log files using Clojure.
And the structure of each line record is "UserID,Lantitude,Lontitude,Timestamp".
My implemented steps are:
----> Read log file & Get top-n user list
----> Find each top-n user's records and store in separate log file (UserID.log) .
The implement source code :
;======================================================
(defn parse-file
""
[file n]
(with-open [rdr (io/reader file)]
(println "001 begin with open ")
(let [lines (line-seq rdr)
res (parse-recur lines)
sorted
(into (sorted-map-by (fn [key1 key2]
(compare [(get res key2) key2]
[(get res key1) key1])))
res)]
(println "Statistic result : " res)
(println "Top-N User List : " sorted)
(find-write-recur lines sorted n)
)))
(defn parse-recur
""
[lines]
(loop [ls lines
res {}]
(if ls
(recur (next ls)
(update-res res (first ls)))
res)))
(defn update-res
""
[res line]
(let [params (string/split line #",")
id (if (> (count params) 1) (params 0) "0")]
(if (res id)
(update-in res [id] inc)
(assoc res id 1))))
(defn find-write-recur
"Get each users' records and store into separate log file"
[lines sorted n]
(loop [x n
sd sorted
id (first (keys sd))]
(if (and (> x 0) sd)
(do (create-write-file id
(find-recur lines id))
(recur (dec x)
(rest sd)
(nth (keys sd) 1))))))
(defn find-recur
""
[lines id]
(loop [ls lines
res []]
(if ls
(recur (next ls)
(update-vec res id (first ls)))
res)))
(defn update-vec
""
[res id line]
(let [params (string/split line #",")
id_ (if (> (count params) 1) (params 0) "0")]
(if (= id id_ )
(conj res line)
res)))
(defn create-write-file
"Create a new file and write information into the file."
([file info-lines]
(with-open [wr (io/writer (str MAIN-PATH file))]
(doseq [line info-lines] (.write wr (str line "\n")))
))
([file info-lines append?]
(with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
(doseq [line info-lines] (.write wr (str line "\n"))))
))
;======================================================
I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), and get the results:
Records-----Size-----Time----Result
1,000-------42KB-----<1s-----OK
10,000------420KB----<1s-----OK
100,000-----4.3MB----3s------OK
1,000,000---43MB-----15s-----OK
6,000,000---258MB---->20M----"OutOfMemoryError Java heap space java.lang.String.substring (String.java:1913)"
======================================================
Here is the question:
1. how can i fix the error when i try to parse big log file , like > 200MB
2. how can i optimize the function to run faster ?
3. there are logs more than 1G size , how can the function deal with it.
I am still new to Clojure, any suggestion or solution will be appreciate~
Thanks
As a direct answer to your questions; from a little Clojure experience.
The quick and dirty fix for running out of memory boils down to giving the JVM more memory. You can try adding this to your project.clj:
:jvm-opts ["-Xmx1G"] ;; or more
That will make Leiningen launch the JVM with a higher memory cap.
This kind of work is going to use a lot of memory no matter how you work it. #Vidya's suggestion ot use a library is definitely worth considering. However, there's one optimization that you can make that should help a little.
Whenever you're dealing with your (line-seq ...) object (a lazy sequence) you should make sure to maintain it as a lazy seq. Doing next on it will pull the whole thing into memory at once. Use rest instead. Take a look at the clojure site, especially the section on laziness:
(rest aseq) - returns a possibly empty seq, never nil
[snip]
a (possibly) delayed path to the remaining items, if any
You may even want to traverse the log twice--once to pull just the username from each line as a lazy-seq, again to filter out those users. This will minimize the amount of the file you're holding onto at any one time.
Making sure your function is lazy should reduce the sheer overhead that having the file as a sequence in memory creates. Whether that's enough to parse a 1G file, I don't think I can say.
You definitely don't need Cascalog or Hadoop simply to parse a file which doesn't fit into your Java heap. This SO question provides some working examples of how to process large files lazily. The main point is you need to keep the file open while you traverse the lazy seq. Here is what worked for me in a similar situation:
(defn lazy-file-lines [file]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader file))))
You can map, reduce, count, etc. over this lazy sequence:
(count (lazy-file-lines "/tmp/massive-file.txt"))
;=> <a large integer>
The parsing is a separate, simpler problem.
I am also relatively new to Clojure, so there are no obvious optimizations I can see. Hopefully others more experienced can offer some advice. But I feel like this is simply a matter of the data size being too big for the tools at hand.
For that reason, I would suggest using Cascalog, an abstraction over Hadoop or your local machine using Clojure. I think the syntax for querying big log files would be pretty straightforward for you.

Call to map-function doesn't seem to do anything

I have the following code:
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(map remove-folder children)
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
My problem is that the line (map remove-folder children) doesn't seem to work. In my output I expect to travel down though the folderstructure but it seems to stay in the first level.
I'll guess I have made some stupid misstake but I spent a few hours on it now and doesn't seem to get closer a solution.
map is lazy, wrap it in a doall to force evaluation, like this: (doall (map remove-folder children)).
As #ponzao said, map is lazy.
But anyway I don't think it's a good idea to use map just to get side effects and "drop" the result. Personally I would use doseq instead, because it gives a hint that I'm expecting side effects.
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(doseq [c children]
(remove-folder c))
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
Why don't you inline children and get rid of def? It doesn't seem that you want to create a global binding in your current namespace. You are unintentionally introducing a side effect.
The function name remove-folder does not tell us what the function is supposed to do. The argument name file does not describe what kind of argument can be expected. But I understand that it's kind of exploratory code.
Cheers -

Clojure create directory hierarchy - but not in a procedural way

Let's say I need to create the following directory structure in Clojure:
a
\--b
| \--b1
| \--b2
\--c
\-c1
Instead of doing procedural things like the following:
(def a (File. "a"))
(.mkdir a)
(def b (File. a "b"))
(.mkdir b)
;; ...
... is there a clever way to somehow represent the above actions as data, declaratively, and then create the hierarchy in one fell swoop?
a quick and simple approach would be to make a vector of dirs to create and map mkdir on to it:
user> (map #(.mkdir (java.io.File. %)) ["a", "a/b" "a/b/c"])
(true true true)
or you can specify your dir structure as a tree and use zippers to walk it making the dirs on the way:
(def dirs ["a" ["b" ["b1" "b2"]] ["c" ["c1"]]])
(defn make-dir-tree [original]
(loop [loc (zip/vector-zip original)]
(if (zip/end? loc)
(zip/root loc)
(recur (zip/next
(do (if (not (vector? (zip/node loc)))
(let [path (apply str (interpose "/" (butlast (map first (zip/path loc)))))
name (zip/node loc)]
(if (empty? path)
(.mkdir (java.io.File. name))
(.mkdir (java.io.File. (str path "/" name))))))
loc))))))
(make-dir-tree dirs)
.
arthur#a:~/hello$ find a
a
a/c
a/c/c1
a/b
a/b/c
a/b/b2
a/b/b1
If you are doing a lot of general systems administration then something heavier may be in order. The pallet project is a library for doing system administration of all sorts on physical and cloud hosted systems (though it tends to lean towards the cloudy stuff). Specifically the directory
Another option if you want to easily handle creating recursive directories is to use .mkdirs
user> (require '[clojure.java.io :as io]')
user> (.mkdirs (io/file "a/b/c/d"))
You can use absolute path eg. /a/b/c/d or else it will be created relative to the path you initiated the repl from.
Also handy to check if given path is not an existing directory
user> (.isDirectory (io/file "a/b/c/d"))

how to load resources from a specific .jar file using clojure.java.io

In clojure.java.io, there is a io/resource function but I think it just loads the resource of the current jar that is running. Is there a way to specify the .jar file that the resource is in?
For example:
I have a jar file: /path/to/abc.jar
abc.jar when unzipped contains some/text/output.txt in the root of the unzipped directory
output.txt contains the string "The required text that I want."
I need functions that can do these operations:
(list-jar "/path/to/abc.jar" "some/text/")
;; => "output.txt"
(read-from-jar "/path/to/abc.jar" "some/text/output.txt")
;; => "The required text that I want"
Thanks in advance!
From Ankur's comments, I managed to piece together the functions that I needed:
The java.util.jar.JarFile object does the job.
you can call the method (.entries (Jarfile. a-path)) to give the list of files but instead of returning a tree structure:
i.e:
/dir-1
/file-1
/file-2
/dir-2
/file-3
/dir-3
/file-4
it returns an enumeration of filenames:
/dir-1/file-1, /dir-1/file-2, /dir-1/dir-2/file-3, /dir-1/dir-3/file-4
The following functions I needed are defined below:
(import java.util.jar.JarFile)
(defn list-jar [jar-path inner-dir]
(if-let [jar (JarFile. jar-path)]
(let [inner-dir (if (and (not= "" inner-dir) (not= "/" (last inner-dir)))
(str inner-dir "/")
inner-dir)
entries (enumeration-seq (.entries jar))
names (map (fn [x] (.getName x)) entries)
snames (filter (fn [x] (= 0 (.indexOf x inner-dir))) names)
fsnames (map #(subs % (count inner-dir)) snames)]
fsnames)))
(defn read-from-jar [jar-path inner-path]
(if-let [jar (JarFile. jar-path)]
(if-let [entry (.getJarEntry jar inner-path)]
(slurp (.getInputStream jar entry)))))
Usage:
(read-from-jar "/Users/Chris/.m2/repository/lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.jar"
"leiningen/new.clj")
;=> "The list of built-in templates can be shown with `lein help new`....."
(list-jar "/Users/Chris/.m2/repository/lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.jar" "leiningen")
;; => (new/app/core.clj new/app/project.clj .....)