using with-open to initialize unknown number of readers/writers in Clojure

using with-open to initialize unknown number of readers/writers in Clojure - clojure

Say that I have a file to be used by n readers and m writers. When I know n and m's value, say if n==3 and m==1, I could write a code like this:
(use 'clojure.java.io)
(with-open [rdr1 (Reader file)
rdr2 (Reader file)
rdr3 (Reader file)
wtr1 (Writer file)]
(time-to-work-out-guys))
Now the case is, the app user determines values of n and m, so I have no idea what value n and m would have. Is there any way that I still can use with-open to initialize readers/writers and do the job?

Because open-with is a macro instead of a function, the only way to build this would be with a macro that generates a call to open-with and then use eval to compile this at runtime. So while technically the answer is yes, I can't honestly recommend doing so. open-with is a convenience that does not fit all cases.
In this case it makes more sense to write your own (try .... (finally ...)) statement.

Related

What does the with-open macro do behind the scenes in clojure?

Visually, with-open looks similar to let. I know with-open is for a different purpose but I cannot find a clear answer as to what with-open is doing. And, what is the first argument in with-open?
The documentation says this:
"bindings => [name init ...]
Evaluates body in a try expression with names bound to the values
of the inits, and a finally clause that calls (.close name) on each
name in reverse order."
I do not understand this. I would really appreciate if someone explained what with-open actually does?

what does it do?
A macroexpanded example, with some formatting and after removing unnecessary explicit usages of clojure.core/...:
(macroexpand-1
'(with-open [reader (some-fn-that-creates-a-reader)]
(read-stuff reader)))
=>
(let [reader (some-fn-that-creates-a-reader)]
(try
;; This `with-let` is just a no-op, like `(let [] ...)` - an artifact of the implementation.
(with-open [] (read-stuff reader))
(finally (. reader close))))
As you can see, it's exactly the same as let, but it wraps the body in a try form and closes the values provided in the binding vector in the finally form.
when should I use it?
When you need for something to be closed at the end, regardless of whether the code in the body was successful or not.
It's a common pattern for reading/writing files with an explicit reader/writer or for using other IO that needs to be opened and closed explicitly.

In Clojure, why do you have to use parenthesis when def'ing a function and use cases of let

I'm starting to learn clojure and I've stumbled upon the following, when I found myself declaring a "sum" function (for learning purposes) I wrote the following code
(def sum (fn [& args] (apply + args)))
I have understood that I defined the symbol sum as containing that fn, but why do I have to enclose the Fn in parenthesis, isn't the compiler calling that function upon definition instead of when someone is actually invoking it? Maybe it's just my imperative brain talking.
Also, what are the use cases of let? Sometimes I stumble on code that use it and other code that don't, for example on the Clojure site there's an exercise to use the OpenStream function from the Java Interop, I wrote the following code:
(defn http-get
[url]
(let [url-obj (java.net.URL. url)]
(slurp (.openStream url-obj))))
(http-get "https://www.google.com")
whilst they wrote the following on the clojure site as an answer
(defn http-get [url]
(slurp
(.openStream
(java.net.URL. url))))
Again maybe it's just my imperative brain talking, the need of having a "variable" or an "object" to store something before using it, but I quite don't understand when I should use let or when I shouldn't.

To answer both of your questions:
1.
(def sum (fn [& args] (apply + args)))
Using def here is very unorthodox. When you define a function you usually want to use defn. But since you used def you should know that def binds a name to a value. fn's return value is a function. Effectively you bound the name sum to the function returned by applying (using parenthesis which are used for application) fn.
You could have used the more traditional (defn sum [& args] (apply + args))
2.
While using let sometimes makes sense for readability (separating steps outside their nested use) it is sometimes required when you want to do something once and use it multiple times. It binds the result to a name within a specified context.
We can look at the following example and see that without let it becomes harder to write (function is for demonstration purposes):
(let [db-results (query "select * from table")] ;; note: query is not a pure function
;; do stuff with db-results
(f db-results)
;; return db-results
db-results)))
This simply re-uses a return value (db-results) from a function that you usually only want to run once - in multiple locations. So let can be used for style like the example you've given, but its also very useful for value reuse within some context.

Both def and defn define a global symbol, sort of like a global variable in Java, etc. Also, (defn xxx ...) is a (very common) shortcut for (def xxx (fn ...)). So, both versions will work exactly the same way when you run the program. Since the defn version is shorter and more explicit, that is what you will do 99% of the time.
Typing (let [xxx ...] ...) defines a local symbol, which cannot be seen by code outside of the let form, just like a local variable (block-scope) in Java, etc.
Just like Java, it is optional when to have a local variable like url-obj. It will make no difference to the running program. You must answer the question, "Which version makes my code easier to read and understand?" This part is no different than Java.

Eval with local bindings function

I'm trying to write a function which takes a sequence of bindings and an expression and returns the result.
The sequence of bindings are formatted thus: ([:bind-type [bind-vec] ... ) where bind-type is either let or letfn. For example:
([:let [a 10 b 20]] [:letfn [(foo [x] (inc x))]] ... )
And the expression just a regular Clojure expression e.g. (foo (+ a b)) so together this example pair of inputs would yeild 31.
Currently I have this:
(defn wrap-bindings
[[[bind-type bind-vec :as binding] & rest] expr]
(if binding
(let [bind-op (case bind-type :let 'let* :letfn 'letfn*)]
`(~bind-op ~bind-vec ~(wrap-bindings rest expr)))
expr))
(defn eval-with-bindings
([bindings expr]
(eval (wrap-bindings bindings expr))))
I am not very experienced with Clojure and have been told that use of eval is generally bad practice. I do not believe that I can write this as a macro since the bindings and expression may only be given at run-time, so what I am asking is: is there a more idiomatic way of doing this?

eval is almost always not the answer though sometimes rare things happen. In this case you meet the criteria because:
since the bindings and expression may only be given at run-time
You desire arbitrary code to be input and run while the program is going
The binding forms to be used can take any data as it's input, even data from elsewhere in the program
So your existing example using eval is appropriate given the contraints of the question at least as I'm understanding it. Perhaps there is room to change the requirements to allow the expressions to be defined in advance and remove the need for eval, though if not then i'd suggest using what you have.

Clojure - process huge files with low memory

I am processing text files 60GB or larger. The files are seperated into a header section of variable length and a data section. I have three functions:
head? a predicate to distinguish header lines from data lines
process-header process one header line string
process-data process one data line string
The processing functions asynchronously access and modify an in-memory database
I advanced on a file reading method from another SO thread, which should build a lazy sequence of lines. The idea was to process some lines with one function, then switch the function once and keep processing with the next function.
(defn lazy-file
[file-name]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(try
(helper (clojure.java.io/reader file-name))
(catch Exception e
(println "Exception while trying to open file" file-name)))))
I use it with something like
(let [lfile (lazy-file "my-file.txt")]
(doseq [line lfile :while head?]
(process-header line))
(doseq [line (drop-while head? lfile)]
(process-data line)))
Although that works, it's rather inefficient for a couple of reasons:
Instead of simply calling process-head until I reach the data and then continuing with process-data, I have to filter header lines and process them, then restart parsing the whole file and drop all header lines to process data. This is the exact opposite of what lazy-file intended to do.
Watching memory consumption shows me, that the program, though seemingly lazy, builds up to use as much RAM as would be required to keep the file in memory.
So what is a more efficient, idiomatic way to work with my database?
One idea might be using a multimethod to process header and data dependant on the value of the head? predicate, but I suppose this would have some serious speed impact, especially as there is only one occurence where the predicate outcome changes from alway true to always false. I didn't benchmark that yet.
Would it be better to use another way to build the line-seq and parse it with iterate? This would still leave me needing to use :while and :drop-while, I guess.
In my research, using NIO file access was mentioned a couple of times, which should improve memory usage. I could not find out yet how to use that in an idiomatic way in clojure.
Maybe I still have a bad grasp of the general idea, how the file should be treated?
As always, any help, ideas or pointers to tuts are greatly appreciated.

You should use standard library functions.
line-seq, with-open and doseq will easily do the job.
Something in the line of:
(with-open [rdr (clojure.java.io/reader file-path)]
(doseq [line (line-seq rdr)]
(if (head? line)
(process-header line)
(process-data line))))

There are several things to consider here:
Memory usage
There are reports that leiningen might add stuff that results in keeping references to the head, although doseq specifically does not hold on to the head of the sequence it's processing, cf. this SO question. Try verifying your claim "use as much RAM as would be required to keep the file in memory" without using lein repl.
Parsing lines
Instead of using two loops with doseq, you could also use a loop/recur approach. What you expect to be parsing would be a second argument like this (untested):
(loop [lfile (lazy-file "my-file.txt")
parse-header true]
(let [line (first lfile)]
(if [and parse-header (head? line)]
(do (process-header line)
(recur (rest lfile) true))
(do (process-data line)
(recur (rest lfile) false)))))
There is another option here, which would be to incorporate your processing functions into your file reading function. So, instead of just consing a new line and returning it, you could just as well process it right away -- typically you could hand over the processing function as an argument instead of hard-coding it.
Your current code looks like processing is a side-effect. If so, you could then probably do away with the laziness if you incorporate the processing. You need to process the entire file anyway (or so it seems) and you do so on a per-line basis. The lazy-seq approach basically just aligns a single line read with a single processing call. Your need for laziness arises in the current solution because you separate reading (the entire file, line by line) from processing. If you instead move the processing of a line into the reading, you don't need to do that lazily.

Using `line-seq` with `reader`, when is the file closed?

I'm reading lines from a text file using (line-seq (reader "input.txt")). This collection is then passed around and used by my program.
I'm concerned that this may be bad style however, as I'm not deterministically closing the file. I imagine that I can't use (with-open (line-seq (reader "input.txt"))), as the file stream will potentially get closed before I've traversed the entire sequence.
Should lazy-seq be avoided in conjunction with reader for files? Is there a different pattern I should be using here?

Since this doesn't really have a clear answer (it's all mixed into comments on the first answer), here's the essence of it:
(with-open [r (reader "input.txt")]
(doall (line-seq r)))
That will force the whole sequence of lines to be read and close the file. You can then pass the result of that whole expression around.
When dealing with large files, you may have memory problems (holding the whole sequence of lines in memory) and that's when it's a good idea to invert the program:
(with-open [r (reader "input.txt")]
(doall (my-program (line-seq r))))
You may or may not need doall in that case, depending on what my-program returns and/or whether my-program consumes the sequence lazily or not.

It seems like the clojure.contrib.duck-streams/read-lines is just what you are looking for. read-lines closes the file when there is no input and returns the sequence just like line-seq. Try to look at source code of read-lines.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using with-open to initialize unknown number of readers/writers in Clojure - clojure

Related

What does the with-open macro do behind the scenes in clojure?

In Clojure, why do you have to use parenthesis when def'ing a function and use cases of let

Eval with local bindings function

Clojure - process huge files with low memory

Using `line-seq` with `reader`, when is the file closed?

Categories

Resources