Efficient serialization: spit, with-out-str, pr - clojure

I have a piece of code which looks as follows:
(defn dump [path blob]
(spit path
(with-out-str (pr blob))))
This is dumping out GBs of data. Is there am more efficient way of doing this? (without creating the intermediate string that with-out-str creates)?

The built in serilization functions use the dynamically bound variable *out* to define where they write to:
user> (def data [1 2 3 4 5])
#'user/data
user> (with-open [output (clojure.java.io/writer "/tmp/data.edn")]
(binding [*out* output]
(prn data)))
nil
user> (slurp "/tmp/data.edn")
"[1 2 3 4 5]\n"
So if you bind this to a file writer (remember to close it, and beware of lazy-evaluation and closing file descriptors) then all the output will go strait to that file. pr and prn will write in a format that makes sure it can be read back. The other print functions will write it a way that's easier for humans and not guaranteed for computers.

Related

What is the idiomatic way of iterating over a lazy sequence in Clojure?

I have the following functions to process large files with constant memory usage.
(defn lazy-helper
"Processes a java.io.Reader lazily"
[reader]
(lazy-seq
(if-let [line (.readLine reader)]
(cons line (lazy-helper reader))
(do (.close reader) nil))))
(defn lazy-lines
"Return a lazy sequence with the lines of the file"
[^String file]
(lazy-helper (io/reader file)))
This works very well when the processing part is filtering or other mapping or reducing operation that works with lazy sequences quite well.
The problem starts when I have process the file and for example send every line over a channel to worker processes.
(thread
(doseq [line lines]
(blocking-producer work-chan line)))
The obvious downside of this is to process the file eagerly causing a heap overflow.
I was wondering what is the best way of iterating over each line in a file and do some IO with the lines.
It seems this might be unrelated how the file IO is handled, doseq should not hold onto the head of the reader.
As #joost-diepenmaat pointed out this might not be related to the file IO and he is right.
It seems the way I am working with JSON serialization and deserialization is the root cause here.
You can use (line-seq rdr) which "returns the lines of text from rdr as a lazy sequence of strings".
This turned out to be a problem with the JSON handling of the code and not the file IO. Explanation in the original post.

How to run clojure program from terminal

I have just begun learning clojure and I'm using the Textmate editor for writing the scripts. However, I am not able to figure out how to run it from the terminal. Like I type the clj filename.clj command but nothing happens. Do I need to include the function name also somewhere because I have a function that takes a number as an argument.
Here is my code that I want to run from the terminal:
(defn next-collatz-num [n]
(if (even? n)
(quot n 2)
(inc (* n 3))))
(defn collatz [n]
(take-while #(< 1 %)(iterate next-collatz-num n)))
(defn max-count-collatz [n]
(when (> n 0)
(first
(reduce
#(if (> (last %1)(last %2)) %1 %2)
[1 1] (map #(list % (count (collatz %))) (range 1 (inc n)))))))
(max-count-collatz 999999)
Clojure has a much more interactive environment than just running a whole script at the terminal command prompt.
TL;DR, install leiningen, create a project.clj, then run lean repl.
If you don't want to create a project.clj, or if you're curious how to do it the hard way, read on...
You can start a Clojure read-eval-print-loop (REPL) interactive prompt with
java -cp clojure-1.6.0.jar clojure.main
(download the latest Clojure jar here).
Once you're in the REPL, load the code file:
(load-file "my-script.clj")
Now, you can call the function directly:
(max-count-collatz 5)
If it doesn't work as you'd expect, change the code, save and reload it in the REPL:
(require 'my-script :reload-all)
While it is possible to run individual Clojure files using Clojure.jar, one of the best things about Clojure is the leiningen dependency manager and build tool. Creating a project is easy, and for anything more than a single file with no external dependencies, it is a huge improvement over using java and Clojure.jar directly.

In Clojure passing a open file pointer into functions

In Conjure, I need to read in a long file, too long to slurp in, and I wish to pass the open file pointer into method, which I can call recursively, reading until it is empty. I have found examples using open-with, but is there a way to open a file and then read from it inside of a function? Points to examples or docs would be helpful.
Is this along the lines of what you have in mind?
(defn process-file [f reader]
(loop [lines (line-seq reader) acc []]
(if (empty? lines)
acc
(recur (rest lines) (conj acc (f (first lines)))))))
(let [filename "/path/to/input-file"
reader (java.io.BufferedReader. (java.io.FileReader. filename))]
(process-file pr-str reader))
Note that if you (require '[clojure.java.io :as io]) you can use io/reader as a shortcut for invoking BufferedReader and FileReader directly. However, using with-open would still be preferable - it will ensure the file is closed properly, even in the event of an exception - and you can absolutely pass the open reader to other functions from within a with-open block.
Here's how you could make use of with-open in the scenario you use in the answer you've posted, passing the reader and writer objects to a function:
(with-open [rdr (io/reader "/path/to/input-file")]
(with-open [wtr (io/writer "/path/to/output-file")]
(transfer rdr wtr)))
I should also note that in my example scenario it would be preferable to map or reduce over the line-seq but I used loop/recur since you asked about recursion.
Here's the ClojureDocs page on the clojure.java.io namespace.
Playing around I discovered the answer, so for any others looking, here is a version of my solution.
(defn transfer
[inFile outFile]
(.write outFile (.read inFile))
...
...
(transfer (clojure.java.io/reader "fileIn.txt)
(clojure.java.io/writer "out.txt"))

Downloading image in Clojure

I'm having trouble downloading images using Clojure, there seems to be an issue with the way the following code works: -
(defn download-image [url filename]
(->> (slurp url) (spit filename)))
This will 'download' the file to the location I specify but the file is unreadable by any image application I try to open it with (for example, attempting to open it in a web browser just return a blank page, attempting to open it in Preview (osx) says it's a corrupted file)
I'm thinking this is might be because slurp should only really be used for text files rather than binary files
Could anyone point me in the right direction for my code to work properly? Any help would be greatly appreciated!
slurp uses java.io.Reader underneath, which will convert the representation to a string, and this is typically not compatible with binary data. Look for examples that use input-stream instead. In some ways, this can be better, because you can transfer the image from the input buffer to the output buffer without having to read the entire thing into memory.
edit
Since people seem to find this question once in awhile and I needed to rewrite this code again. I thought I'd add an example. Note, this does not stream the data, it collects it into memory and returns it an array of bytes.
(require '[clojure.java.io :as io])
(defn blurp [f]
(let [dest (java.io.ByteArrayOutputStream.)]
(with-open [src (io/input-stream f)]
(io/copy src dest))
(.toByteArray dest)))
Test...
(use 'clojure.test)
(deftest blurp-test
(testing "basic operation"
(let [src (java.io.ByteArrayInputStream. (.getBytes "foo" "utf-8"))]
(is (= "foo" (-> (blurp src) (String. "utf-8")))))))
Example...
user=> (blurp "http://www.lisperati.com/lisplogo_256.png")
#<byte[] [B#15671adf>

How can I capture the standard output of clojure?

I have some printlns I need to capture from a Clojure program and I was wondering how I could capture the output?
I have tried:
(binding [a *out*]
(println "h")
a
)
: but this doesn't work
(with-out-str (println "this should return as a string"))
Just to expand a little on Michiel's answer, when you want to capture output to a file you can combine with-out-str with spit.
When you don't want to build up a huge string in memory before writing it out then you can use with-out-writer from the clojure.contrib.io library.
with-out-writer is a macro that nicely encapsulates the correct opening and closing of the file resource and the binding of a writer on that file to *out* while executing the code in its body.
Michiel's exactly right. Since I can't add code in a comment on his answer, here's what with-out-str does under the covers, so you can compare it with your attempt:
user=> (macroexpand-1 '(with-out-str (println "output")))
(clojure.core/let [s__4091__auto__ (new java.io.StringWriter)]
(clojure.core/binding [clojure.core/*out* s__4091__auto__]
(println "output")
(clojure.core/str s__4091__auto__)))
Your code was binding the existing standard output stream to a variable, printing to that stream, and then asking the stream for its value via the variable; however, the value of the stream was of course not the bytes that had been printed to it. So with-out-str binds a newly created StringWriter to *out* temporarily, and finally queries the string value of that temporary writer.