Best way to read contents of file into a set in Clojure - clojure

I'm learning Clojure and as an exercise I wanted to write something like the unix "comm" command.
To do this, I read the contents of each file into a set, then use difference/intersection to show exclusive/common files.
After a lot of repl-time I came up with something like this for the set creation part:
(def contents (ref #{}))
(doseq [line (read-lines "/tmp/a.txt")]
(dosync (ref-set contents (conj #contents line))))
(I'm using duck-streams/read-lines to seq the contents of the file).
This is my first stab at any kind of functional programming or lisp/Clojure. For instance, I couldn't understand why, when I did a conj on the set, the set was still empty. This lead me to learning about refs.
Is there a better Clojure/functional way to do this? By using ref-set, am I just twisting the code to a non-functional mindset or is my code along the lines of how it should be done?
Is there a a library that already does this? This seems like a relatively ordinary thing to want to do but I couldn't find anything like it.

Clojure 1.3:
user> (require '[clojure.java [io :as io]])
nil
user> (line-seq (io/reader "foo.txt"))
("foo" "bar" "baz")
user> (into #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
line-seq gives you a lazy sequence where each item in the sequence is a line in the file.
into dumps it all into a set. To do what you were trying to do (add each item one by one into a set), rather than doseq and refs, you could do:
user> (reduce conj #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
Note that the Unix comm compares two sorted files, which is likely a more efficient way to compare files than doing set intersection.
Edit: Dave Ray is right, to avoid leaking open file handles it's better to do this:
user> (with-open [f (io/reader "foo.txt")]
(into #{} (line-seq f)))
#{"foo" "bar" "baz"}

I always read with slurp and after that split with re-seq due to my needs.

Related

Clojure : Read an edn file line by line

I have written data like that in a file (kind of)
{:a 25 :b 28}
{:a 2 :b 50}
...
I want to have a lazy sequence of these maps.
There are around 40 millions of lines. I can also write chunks of 10000 but I do not thnik it will change the way the functions are written (mapcat instead of map)
To read it, I wrote
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string affectations))
The problem is that Clojure tells
Don't know how to create ISeq from : java.io.BufferedReader
To be honest I understand nothing on the java.io namespace.
I would like to have a lazy sequence of the data in the file but I do not know how to turn the stream into strings and then collections.
Any idea ?
Is this read-line ?
Thanks
You are passing java.io.BufferedReader to map whereas map expects a seq.
You need to use line-seq to produce a (lazy) seq of lines from your file:
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string (lazy-seq affectations)))
Remember, that you need to force all your side effects on the data read from a resource opened in with-open within its scope, otherwise you will get errors.
One option is to just force the whole seq of text lines from your files and return it using doall. However, this solution could read all your data into memory which doesn't seem practical.
I guess you need to execute some logic for each of the line from the file and you don't need to keep all those parsed collections in memory. In such case you could pass a function representing that logic into your function handling reading your file:
(defn process-file [filename process-fn]
(with-open [reader (io/reader filename)]
(doseq [line (line-seq reader)]
(-> line
(read-string)
(process-fn)))))
This function will read your file line by line converting each of it individually using read-string and calling your process-fn function. process-file will return nil.

Can I convert a Clojure form from one package to another?

Background
I've written a hack for Emacs that lets me send a Clojure form from an editor buffer to a REPL buffer. It's working fine, except that if the two buffers are in different namespaces the copied text doesn't usually make sense, or, worse, it might make sense but have a different meaning to that in the editor buffer.
I want to transform the text so that it makes sense in the REPL buffer.
A Solution in Common Lisp
In Common Lisp, I could do this using the following function:
;; Common Lisp
(defun translate-text-between-packages (text from-package to-package)
(let* ((*package* from-package)
(form (read-from-string text))
(*package* to-package))
(with-output-to-string (*standard-output*)
(pprint form))))
And a sample use:
;; Common Lisp
(make-package 'editor-package)
(make-package 'repl-package)
(defvar repl-package::a)
(translate-text-between-packages "(+ repl-package::a b)"
(find-package 'editor-package)
(find-package 'repl-package))
;; => "(+ A EDITOR-PACKAGE::B)"
The package name qualifications in the input string and the output string are different—exactly what's needed to solve the problem of copying and pasting text between packages.
(BTW, there's stuff about how to run the translation code in the Common Lisp process and move stuff between the Emacs world and the Common Lisp world, but I'm ok with that and I don't particularly want to get into it here.)
A Non-Solution in Clojure
Here's a direct translation into Clojure:
;; Clojure
(defn translate-text-between-namespaces [text from-ns to-ns]
(let [*ns* from-ns
form (read-string text)
*ns* to-ns]
(with-out-str
(clojure.pprint/pprint form))))
And a sample use:
;; Clojure
(create-ns 'editor-ns)
(create-ns 'repl-ns)
(translate-text-between-namespaces "(+ repl-ns/a b)"
(find-ns 'editor-ns)
(find-ns 'repl-ns))
;; => "(+ repl-ns/a b)"
So the translation function in Clojure has done nothing. That's because symbols and packages/namespaces in Common Lisp and Clojure work differently.
In Common Lisp symbols belong to a package and the determination of a symbol's package happens at read time.
In Clojure, for good reasons, symbols do not belong to a namespace and the determination of a symbol's namespace happens at evaluation time.
Can This Be Done in Clojure?
So, finally, my question: Can I convert Clojure code from one namespace to another?
I don't understand your use case, but here is a way to transform symbols from one namespace to another.
(require 'clojure.walk 'clojure.pprint)
(defn ns-trans-form [ns1 ns2 form]
(clojure.walk/prewalk
(fn [f] (if ((every-pred symbol? #(= (namespace %) ns1)) f)
(symbol ns2 (name f))
f))
form))
(defn ns-trans-text [ns1 ns2 text]
(with-out-str
(->> text
read-string
(ns-trans-form ns1 ns2)
clojure.pprint/pprint)))
(print (ns-trans-text "editor-ns" "repl-ns" "(+ editor-ns/a b)" ))
;=> (+ repl-ns/a b)
So, editor-ns/a was transformed to repl-ns/a.
(Answering my own question...)
Given that it's not easy to refer to a namespace's non-public vars from outside the namespace, there's no simple way to do this.
Perhaps a hack is possible, based on the idea at http://christophermaier.name/blog/2011/04/30/not-so-private-clojure-functions. That would involve walking the form and creating new symbols that resolve to new vars that have the same value as vars referred to in the original form. Perhaps I'll investigate this further sometime, but not right now.

Holding onto the head of sequence when using "rest"

I am parsing a big csv file and I am using the first line of it as the keys for the records. So for a csv file like:
header1,header2
foo,bar
zoo,zip
I end up with a lazy seq like:
({:header1 "foo" :header2 "bar"},
{:header1 "zoo" :header2 "zip"})
The code working fine, but I am not sure if in the following function I am holding the head of "lines" or not.
(defn csv-as-seq [file]
(let [rdr (clojure.java.io/reader file)]
(let [lines (line-seq rdr)
headers (parse-headers (first lines))]
(map (row-mapper headers) (rest lines)))))
Can somebody please clarify?
Yes, this expression syntactically says to hold the head
(let [lines (line-seq rdr)
though in this case you should get away with it because their are no references to
lines and headers after the call to map and the Clojure compiler starting with 1.2.x includes a feature called locals clearing: it sets any locals not used after a function call to nil in the preamble to the function call. In this case it will set lines and headers to nil in the local context of the function and they will be GCd as used. This is one of the rare cases where clojure produces bytecode that cannot be expressed in java.

Treating a file of Java floats as a lazy Clojure sequence

What would be an ideomatic way in Clojure to get a lazy sequence over a file containing float values serialized from Java? (I've toyed with a with-open approach based on line-reading examples but cannot seem to connect the dots to process the stream as floats.)
Thanks.
(defn float-seqs [#^java.io.DataInputStream dis]
(lazy-seq
(try
(cons (.readFloat dis) (float-seqs dis))
(catch java.io.EOFException e
(.close dis)))))
(with-open [dis (-> file java.io.FileInputStream. java.io.DataInputStream.)]
(let [s (float-seqs dis)]
(doseq [f s]
(println f))))
You are not required to use with-open if you are sure you are going to consume the whole seq.
If you use with-open, double-check that you're not leaking the seq (or a derived seq) outside of its scope.

How do I use Zip in Clojure?

I am very very new to clojure. The zip utility looks interesting but I cant seem to use it.
I tried
;; ZIP
(:use "zip")
(def data '[[a * b] + [c * d]])
(def dz (zip/vector-zip data))
But I am getting
java.lang.Exception: No such namespace: zip
How do yo use external libraries?
You may be confusing two different ways to import code. You can do it this way:
user> (use 'clojure.zip)
Or while you're declaring a namespace in a source file:
(ns foo
(:use clojure.zip))
The second version is a macro that is expanded into the first.
Outside of (ns), doing (:use "zip") is going to treat :use as a function and call it with "zip" as its parameter (i.e. try to use the string "zip" as a collection and look up the key :use in it), which does nothing.
clojure.zip has some functions whose names clash with things in clojure.core though, so you either have to do something like this:
user> (use '(clojure [zip :rename {next next-zip replace replace-zip remove remove-zip}]))
Or preferably this:
user> (require '(clojure [zip :as zip]))
With the latter you can refer to functions like (zip/vector-zip data) as you wish.
See the documentation for require and refer and the page talking about libs.
I don't know much about clojure, but this little ditty seems to work:
(require '[clojure.zip :as zip])
(def t '(:a (:b :d) (:c :e :f)))
(def z (zip/zipper rest rest cons t))
(zip/node z)