I have the following code:
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(map remove-folder children)
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
My problem is that the line (map remove-folder children) doesn't seem to work. In my output I expect to travel down though the folderstructure but it seems to stay in the first level.
I'll guess I have made some stupid misstake but I spent a few hours on it now and doesn't seem to get closer a solution.
map is lazy, wrap it in a doall to force evaluation, like this: (doall (map remove-folder children)).
As #ponzao said, map is lazy.
But anyway I don't think it's a good idea to use map just to get side effects and "drop" the result. Personally I would use doseq instead, because it gives a hint that I'm expecting side effects.
(defn remove-folder [file]
(do
(println "Called with file " (.getName file))
(if (.isDirectory file)
(do
(println (.getName file) " is directory")
(def children (.listFiles file))
(println "Number of children " (count children))
(doseq [c children]
(remove-folder c))
(delete-file file)))
(do
(println (.getName file) " is file")
(delete-file file)
)))
Why don't you inline children and get rid of def? It doesn't seem that you want to create a global binding in your current namespace. You are unintentionally introducing a side effect.
The function name remove-folder does not tell us what the function is supposed to do. The argument name file does not describe what kind of argument can be expected. But I understand that it's kind of exploratory code.
Cheers -
Related
I made a function to take questions like this.
(defn ask-ques [ques pred]
(print ques)
(let [user-input (read-line)]
(if #(pred user-input) user-input (recur ques pred))))
And I wrote main like this.
(defn -main []
(loop []
(let [user-input (ask-ques "CHOOSE ONE. (C)ontinue OR (E)xit : " #(contains? #{"C" "E"} %))]
(when (= user-input "C") (apply body (rand-nth (seq voc-map))) (recur)))))
But, Clojure received the input first and printed "CHOOSE ONE. (C)ontinue OR (E)xit : " out, and pred does not work well.
What's the problem? Why does it work like this? And what should I do?
#(pred user-input) is a function of zero arguments and, since it has a non-nil value, the if will treat it as truth, so you will always get user-input and it will never recur. I suspect you want (pred user-input) instead.
I want to be able to format an entire clojure file to look nice. The nicest thing I have found is clojures pprint. It does indentation and line breaks at correct locations. However it can only read clojure litterals. Clojure files are read in as strings. read-string will only take the first parenthesis of a string. Mapping read-string over the whole sequence has a host of issues I am running into. Does someone know of an automatic way to make a clojure file look pretty? Not just indent it correctly?
You can use lein-zprint which will run the zprint library over your source files. If you are a boot user, you can use boot-fmt to process your files, which also uses zprint.
The zprint library will completely reformat your source from scratch to be as pretty as it knows how to make it. It actually tries a couple of things at each level to see which is "better", and has a number of heuristics built in to try to produce something that fits as much information as it can in the vertical space while still looking "pretty".
It knows a lot about Clojure (and Clojurescript) source code, and knows which functions need different kinds of processing as well as handling the new clojure.spec (cljs.spec) files well too.
It is almost absurdly configurable, so with a little work you can tune it to output the code the way that you want to see it. But even without any configuration, it generally does a good job making your code look nice.
Using it with lein-zprint is pretty trivial.
Place [lein-zprint "0.1.16"] into the :plugins vector of your project.clj:
:plugins [[lein-zprint "0.1.16"]]
Then, to format a source file, simply invoke lein zprint on that file:
$ lein zprint src/<project>/<file-name>.clj
Unless you tell it otherwise (which is trivial to do in your project.clj), it will rename the existing file to <file-name>.clj.old so that you have them to compare while you are trying it out.
Here is an example (obviously poorly formatted):
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map] (let [style-name
(get new-map :style :not-specified) ] (if
(= style-name :not-specified) [existing-map doc-map nil]
(let [style-map ( if (= style-name :default)
(get-default-options) (get-in existing-map [:style-map style-name]))]
(cond (nil? style-name)
[existing-map doc-map "Can't specify a style of nil!"]
style-map [(merge-deep existing-map style-map) (when doc-map
(diff-deep-doc (str doc-string " specified :style " style-name)
doc-map existing-map style-map)) nil] :else
[existing-map doc-map (str "Style '" style-name "' not found!")])))))
after formatting with $lein zprint 70 src/example/apply.clj which
formats it for 70 columns, to make it fit better for this answer:
(defn apply-style-x
"Given an existing-map and a new-map, if the new-map specifies a
style, apply it if it exists. Otherwise do nothing. Return
[updated-map new-doc-map error-string]"
[doc-string doc-map existing-map new-map]
(let [style-name (get new-map :style :not-specified)]
(if (= style-name :not-specified)
[existing-map doc-map nil]
(let [style-map (if (= style-name :default)
(get-default-options)
(get-in existing-map
[:style-map style-name]))]
(cond
(nil? style-name) [existing-map doc-map
"Can't specify a style of nil!"]
style-map
[(merge-deep existing-map style-map)
(when doc-map
(diff-deep-doc
(str doc-string " specified :style " style-name)
doc-map
existing-map
style-map)) nil]
:else [existing-map doc-map
(str "Style '" style-name "' not found!")])))))
You can use weavejester/cljfmt:
$ boot -d cljfmt:0.5.6 repl
boot.user=> (require '[cljfmt.core :as cljfmt])
nil
boot.user=> (println (cljfmt/reformat-string
#_=> "( let [x 3
#_=> y 4]
#_=> (+ (* x x
#_=> )(* y y)
#_=> ))"))
(let [x 3
y 4]
(+ (* x x) (* y y)))
nil
Check its README for the supported formatting options.
Subj. There's a working program, which basically copies filesystem trees recursively. Somehow println from inside the recursive function won't show any output.
build-album calls traverse-dir; I can see the "10" in the console, but never any "11"s -- should be a lot of them. (println "11") can't possibly miss the path of execution, since files get really copied (the line above). This is not quite nice, since the project is meant as a console application, reporting to the user each copied file, lest he should suspect freezing. This is no joke, because the app is intended to upload albums to mobile phones.
(defn traverse-dir
"Traverses the (source) directory, preorder"
[src-dir dst-step]
(let [{:keys [options arguments]} *parsed-args*
dst-root (arguments 1)
[dirs files] (list-dir-groomed (fs/list-dir src-dir))
dir-handler (fn [dir-obj]
"Processes the current directory, source side;
creates properly named directory destination side, if necessary"
(let [dir (.getPath dir-obj)
step (str dst-step *nix-sep* (fs/base-name dir-obj))]
(fs/mkdir (str dst-root step))
(traverse-dir dir step)))
file-handler (fn [file-obj]
"Copies the current file, properly named and tagged"
(let [dst-path (str dst-root dst-step *nix-sep* (.getName file-obj))]
(fs/copy file-obj (fs/file dst-path))
(println "11")
dst-path))]
(concat (map dir-handler dirs) (map file-handler files))))
(defn build-album
"Copy source files to destination according
to command line options"
[]
(let [{:keys [options arguments]} *parsed-args*
output (traverse-dir (arguments 0) "")]
(println "10")
output))
Might be the problem with lazy sequences: you build a lazy seq which is never realized and thus the code never executes. Try calling doall on the result of traverse-dir:
(doall (concat (map dir-handler dirs) (map file-handler files))))
I am trying to iterate over a list of files in a given directory, and add an incrementing variable i = {1,2,3.....} to their names.
Here is the code I have for iterating through the files and changing each file's name:
(defn addCounterToExtIn [d]
(def i 0)
(doseq [f (.listFiles (file d)) ] ; make a sequence of all files in d
(if (and (not (.isDirectory f)) ; if file is not a directry and
(= '(\. \i \n) (take-last 3 (.getName f))) ) ; if it ends with .in
(fs/rename f (str d '/ i (.getName f)))))) ; add i to start of its name
I don't know how can I increment i as doseq iterates through each file. Alternatively, is there a better loop to use to achieve the desired result?
use file-seq and map-indexed:
(require '[clojure.java.io :as io])
(dorun
(->>
(file-seq (io/file "/home/eduard/Downloads"))
(filter #(re-find #".+\.pdf$" (.getName %)))
(map-indexed (fn [i v] [i v]))))
Change function in map-indexed to rename and you're done.
The sample output for pdf files:
([0 #<File /home/eduard/Downloads/some.pdf>] ...)
This is the first approach off the top of my head. It's not ideal, but certainly more idiomatic than what the question proposes.
(def rename-one-file! [file counter]
(if (and (not (.isDirectory file))
(= ".in" (str (take-last 3 (.getName file)))))
(fs/rename file (file (parent dir)
(str counter (.getName file)))))
(defn iterate-files-with-counter [fn dir]
(loop [counter 0
remaining-files (.listFiles (file dir))]
(let [current-file (first remaining-files)]
(fn file counter)
(recur (+ counter 1) (rest remaining-files))))
(def add-counter-to-ext-in-dir
(partial iterate-files-with-counter rename-one-file!))
Note that the work of actually performing the rename was split off from the work of iterating over the files. Having a large number of small functions is better than than a small number of large functions in general, and making those functions reusable / independent unless you choose to use them together is even better than that.
all.
I want to parse big log files using Clojure.
And the structure of each line record is "UserID,Lantitude,Lontitude,Timestamp".
My implemented steps are:
----> Read log file & Get top-n user list
----> Find each top-n user's records and store in separate log file (UserID.log) .
The implement source code :
;======================================================
(defn parse-file
""
[file n]
(with-open [rdr (io/reader file)]
(println "001 begin with open ")
(let [lines (line-seq rdr)
res (parse-recur lines)
sorted
(into (sorted-map-by (fn [key1 key2]
(compare [(get res key2) key2]
[(get res key1) key1])))
res)]
(println "Statistic result : " res)
(println "Top-N User List : " sorted)
(find-write-recur lines sorted n)
)))
(defn parse-recur
""
[lines]
(loop [ls lines
res {}]
(if ls
(recur (next ls)
(update-res res (first ls)))
res)))
(defn update-res
""
[res line]
(let [params (string/split line #",")
id (if (> (count params) 1) (params 0) "0")]
(if (res id)
(update-in res [id] inc)
(assoc res id 1))))
(defn find-write-recur
"Get each users' records and store into separate log file"
[lines sorted n]
(loop [x n
sd sorted
id (first (keys sd))]
(if (and (> x 0) sd)
(do (create-write-file id
(find-recur lines id))
(recur (dec x)
(rest sd)
(nth (keys sd) 1))))))
(defn find-recur
""
[lines id]
(loop [ls lines
res []]
(if ls
(recur (next ls)
(update-vec res id (first ls)))
res)))
(defn update-vec
""
[res id line]
(let [params (string/split line #",")
id_ (if (> (count params) 1) (params 0) "0")]
(if (= id id_ )
(conj res line)
res)))
(defn create-write-file
"Create a new file and write information into the file."
([file info-lines]
(with-open [wr (io/writer (str MAIN-PATH file))]
(doseq [line info-lines] (.write wr (str line "\n")))
))
([file info-lines append?]
(with-open [wr (io/writer (str MAIN-PATH file) :append append?)]
(doseq [line info-lines] (.write wr (str line "\n"))))
))
;======================================================
I tested this clj in REPL with command (parse-file "./DATA/log.log" 3), and get the results:
Records-----Size-----Time----Result
1,000-------42KB-----<1s-----OK
10,000------420KB----<1s-----OK
100,000-----4.3MB----3s------OK
1,000,000---43MB-----15s-----OK
6,000,000---258MB---->20M----"OutOfMemoryError Java heap space java.lang.String.substring (String.java:1913)"
======================================================
Here is the question:
1. how can i fix the error when i try to parse big log file , like > 200MB
2. how can i optimize the function to run faster ?
3. there are logs more than 1G size , how can the function deal with it.
I am still new to Clojure, any suggestion or solution will be appreciate~
Thanks
As a direct answer to your questions; from a little Clojure experience.
The quick and dirty fix for running out of memory boils down to giving the JVM more memory. You can try adding this to your project.clj:
:jvm-opts ["-Xmx1G"] ;; or more
That will make Leiningen launch the JVM with a higher memory cap.
This kind of work is going to use a lot of memory no matter how you work it. #Vidya's suggestion ot use a library is definitely worth considering. However, there's one optimization that you can make that should help a little.
Whenever you're dealing with your (line-seq ...) object (a lazy sequence) you should make sure to maintain it as a lazy seq. Doing next on it will pull the whole thing into memory at once. Use rest instead. Take a look at the clojure site, especially the section on laziness:
(rest aseq) - returns a possibly empty seq, never nil
[snip]
a (possibly) delayed path to the remaining items, if any
You may even want to traverse the log twice--once to pull just the username from each line as a lazy-seq, again to filter out those users. This will minimize the amount of the file you're holding onto at any one time.
Making sure your function is lazy should reduce the sheer overhead that having the file as a sequence in memory creates. Whether that's enough to parse a 1G file, I don't think I can say.
You definitely don't need Cascalog or Hadoop simply to parse a file which doesn't fit into your Java heap. This SO question provides some working examples of how to process large files lazily. The main point is you need to keep the file open while you traverse the lazy seq. Here is what worked for me in a similar situation:
(defn lazy-file-lines [file]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader file))))
You can map, reduce, count, etc. over this lazy sequence:
(count (lazy-file-lines "/tmp/massive-file.txt"))
;=> <a large integer>
The parsing is a separate, simpler problem.
I am also relatively new to Clojure, so there are no obvious optimizations I can see. Hopefully others more experienced can offer some advice. But I feel like this is simply a matter of the data size being too big for the tools at hand.
For that reason, I would suggest using Cascalog, an abstraction over Hadoop or your local machine using Clojure. I think the syntax for querying big log files would be pretty straightforward for you.