What would be an ideomatic way in Clojure to get a lazy sequence over a file containing float values serialized from Java? (I've toyed with a with-open approach based on line-reading examples but cannot seem to connect the dots to process the stream as floats.)
Thanks.
(defn float-seqs [#^java.io.DataInputStream dis]
(lazy-seq
(try
(cons (.readFloat dis) (float-seqs dis))
(catch java.io.EOFException e
(.close dis)))))
(with-open [dis (-> file java.io.FileInputStream. java.io.DataInputStream.)]
(let [s (float-seqs dis)]
(doseq [f s]
(println f))))
You are not required to use with-open if you are sure you are going to consume the whole seq.
If you use with-open, double-check that you're not leaking the seq (or a derived seq) outside of its scope.
Related
I have written data like that in a file (kind of)
{:a 25 :b 28}
{:a 2 :b 50}
...
I want to have a lazy sequence of these maps.
There are around 40 millions of lines. I can also write chunks of 10000 but I do not thnik it will change the way the functions are written (mapcat instead of map)
To read it, I wrote
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string affectations))
The problem is that Clojure tells
Don't know how to create ISeq from : java.io.BufferedReader
To be honest I understand nothing on the java.io namespace.
I would like to have a lazy sequence of the data in the file but I do not know how to turn the stream into strings and then collections.
Any idea ?
Is this read-line ?
Thanks
You are passing java.io.BufferedReader to map whereas map expects a seq.
You need to use line-seq to produce a (lazy) seq of lines from your file:
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string (lazy-seq affectations)))
Remember, that you need to force all your side effects on the data read from a resource opened in with-open within its scope, otherwise you will get errors.
One option is to just force the whole seq of text lines from your files and return it using doall. However, this solution could read all your data into memory which doesn't seem practical.
I guess you need to execute some logic for each of the line from the file and you don't need to keep all those parsed collections in memory. In such case you could pass a function representing that logic into your function handling reading your file:
(defn process-file [filename process-fn]
(with-open [reader (io/reader filename)]
(doseq [line (line-seq reader)]
(-> line
(read-string)
(process-fn)))))
This function will read your file line by line converting each of it individually using read-string and calling your process-fn function. process-file will return nil.
Background
I've written a hack for Emacs that lets me send a Clojure form from an editor buffer to a REPL buffer. It's working fine, except that if the two buffers are in different namespaces the copied text doesn't usually make sense, or, worse, it might make sense but have a different meaning to that in the editor buffer.
I want to transform the text so that it makes sense in the REPL buffer.
A Solution in Common Lisp
In Common Lisp, I could do this using the following function:
;; Common Lisp
(defun translate-text-between-packages (text from-package to-package)
(let* ((*package* from-package)
(form (read-from-string text))
(*package* to-package))
(with-output-to-string (*standard-output*)
(pprint form))))
And a sample use:
;; Common Lisp
(make-package 'editor-package)
(make-package 'repl-package)
(defvar repl-package::a)
(translate-text-between-packages "(+ repl-package::a b)"
(find-package 'editor-package)
(find-package 'repl-package))
;; => "(+ A EDITOR-PACKAGE::B)"
The package name qualifications in the input string and the output string are different—exactly what's needed to solve the problem of copying and pasting text between packages.
(BTW, there's stuff about how to run the translation code in the Common Lisp process and move stuff between the Emacs world and the Common Lisp world, but I'm ok with that and I don't particularly want to get into it here.)
A Non-Solution in Clojure
Here's a direct translation into Clojure:
;; Clojure
(defn translate-text-between-namespaces [text from-ns to-ns]
(let [*ns* from-ns
form (read-string text)
*ns* to-ns]
(with-out-str
(clojure.pprint/pprint form))))
And a sample use:
;; Clojure
(create-ns 'editor-ns)
(create-ns 'repl-ns)
(translate-text-between-namespaces "(+ repl-ns/a b)"
(find-ns 'editor-ns)
(find-ns 'repl-ns))
;; => "(+ repl-ns/a b)"
So the translation function in Clojure has done nothing. That's because symbols and packages/namespaces in Common Lisp and Clojure work differently.
In Common Lisp symbols belong to a package and the determination of a symbol's package happens at read time.
In Clojure, for good reasons, symbols do not belong to a namespace and the determination of a symbol's namespace happens at evaluation time.
Can This Be Done in Clojure?
So, finally, my question: Can I convert Clojure code from one namespace to another?
I don't understand your use case, but here is a way to transform symbols from one namespace to another.
(require 'clojure.walk 'clojure.pprint)
(defn ns-trans-form [ns1 ns2 form]
(clojure.walk/prewalk
(fn [f] (if ((every-pred symbol? #(= (namespace %) ns1)) f)
(symbol ns2 (name f))
f))
form))
(defn ns-trans-text [ns1 ns2 text]
(with-out-str
(->> text
read-string
(ns-trans-form ns1 ns2)
clojure.pprint/pprint)))
(print (ns-trans-text "editor-ns" "repl-ns" "(+ editor-ns/a b)" ))
;=> (+ repl-ns/a b)
So, editor-ns/a was transformed to repl-ns/a.
(Answering my own question...)
Given that it's not easy to refer to a namespace's non-public vars from outside the namespace, there's no simple way to do this.
Perhaps a hack is possible, based on the idea at http://christophermaier.name/blog/2011/04/30/not-so-private-clojure-functions. That would involve walking the form and creating new symbols that resolve to new vars that have the same value as vars referred to in the original form. Perhaps I'll investigate this further sometime, but not right now.
I've been trying to get a simple reg-ex working in Clojure to test a string for some SQL reserved words (select, from, where etc.) but just can't get it to work:
(defn areserved? [c]
(re-find #"select|from|where|order by|group by" c))
(I split a string by spaces then go over all the words)
Help would be greatly appreciated,
Thanks!
EDIT: My first goal (after only reading some examples and basic Clojure materials) is to parse a string and return for each part of it (i.e. words) what "job" they have in the statement (a reserved word, a string etc.).
What I have so far:
(use '[clojure.string :only (join split)])
(defn isdigit? [c]
(re-find #"[0-9]" c))
(defn isletter? [c]
(re-find #"[a-zA-Z]" c))
(defn issymbol? [c]
(re-find #"[\(\)\[\]!\.+-><=\?*]" c))
(defn isstring? [c]
(re-find #"[\"']" c))
(defn areserved? [c]
(if (re-find #"select|from|where|order by|group by" c)
true
false))
(defn what-is [token]
(let [c (subs token 0 1)]
(cond
(isletter? c) :word
(areserved? c) :reserved
(isdigit? c) :number
(issymbol? c) :symbol
(isstring? c) :string)))
(defn checkr [token]
{:token token
:type (what-is token)})
(defn goparse [sql-str]
(map checkr (.split sql-str " ")))
Thanks for all the help guys! it's great to see so much support for such a relatively new language (at least for me :) )
I'm not entirely sure what you want exactly, but here's a couple of variations to coerce your first regex match to a boolean:
(defn areserved? [c]
(string?
(re-find #"select|from|where|order by|group by"c)))
(defn areserved? [c]
(if (re-find #"select|from|where|order by|group by"c)
true
false))
UPDATE in response to question edit:
Thanks for posting more code. Unfortunately there are a number of issues here that we could
try to address by patching your existing code in a simplistic and naïve fashion, but it will
only get you so far, before you hit the next problem with this single iteration approach.
#alex is correct, that your areserved? method will fail to match order by if you have already
split your string by white space. That said, a simple fix is to treat order and by as separate keywords (which they are, even though they always appear together).
The next issue is that the areserved? function will match keywords in a string, but you are dispatching it against a character in the what-is function. You nearly always get a match in your cond for isletter?, so you will everything is marked as a 'word'.
All in all, it looks like you are trying to do too much work in a single application of map.
I'm not sure if you are just doing this for fun to play with Clojure (which is admirable - keep going!), in which case, maybe it doesn't matter if you press on with this simple parsing approach... you'll definitely learn something; but if you would like to take it further and parse SQL more successfully, then I would suggest that you may find it helpful to to read a little on Lexing, Parsing and building Abstract Syntax Trees (AST).
Brian Carper has written about using the Java parser generator "ANTLR" from Clojure - it's a few years old, but might be worth looking at.
You also might be able to get some transferrable ideas from this chapter from the F# programming book on lexing and parsing SQL.
I am parsing a big csv file and I am using the first line of it as the keys for the records. So for a csv file like:
header1,header2
foo,bar
zoo,zip
I end up with a lazy seq like:
({:header1 "foo" :header2 "bar"},
{:header1 "zoo" :header2 "zip"})
The code working fine, but I am not sure if in the following function I am holding the head of "lines" or not.
(defn csv-as-seq [file]
(let [rdr (clojure.java.io/reader file)]
(let [lines (line-seq rdr)
headers (parse-headers (first lines))]
(map (row-mapper headers) (rest lines)))))
Can somebody please clarify?
Yes, this expression syntactically says to hold the head
(let [lines (line-seq rdr)
though in this case you should get away with it because their are no references to
lines and headers after the call to map and the Clojure compiler starting with 1.2.x includes a feature called locals clearing: it sets any locals not used after a function call to nil in the preamble to the function call. In this case it will set lines and headers to nil in the local context of the function and they will be GCd as used. This is one of the rare cases where clojure produces bytecode that cannot be expressed in java.
I'm learning Clojure and as an exercise I wanted to write something like the unix "comm" command.
To do this, I read the contents of each file into a set, then use difference/intersection to show exclusive/common files.
After a lot of repl-time I came up with something like this for the set creation part:
(def contents (ref #{}))
(doseq [line (read-lines "/tmp/a.txt")]
(dosync (ref-set contents (conj #contents line))))
(I'm using duck-streams/read-lines to seq the contents of the file).
This is my first stab at any kind of functional programming or lisp/Clojure. For instance, I couldn't understand why, when I did a conj on the set, the set was still empty. This lead me to learning about refs.
Is there a better Clojure/functional way to do this? By using ref-set, am I just twisting the code to a non-functional mindset or is my code along the lines of how it should be done?
Is there a a library that already does this? This seems like a relatively ordinary thing to want to do but I couldn't find anything like it.
Clojure 1.3:
user> (require '[clojure.java [io :as io]])
nil
user> (line-seq (io/reader "foo.txt"))
("foo" "bar" "baz")
user> (into #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
line-seq gives you a lazy sequence where each item in the sequence is a line in the file.
into dumps it all into a set. To do what you were trying to do (add each item one by one into a set), rather than doseq and refs, you could do:
user> (reduce conj #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
Note that the Unix comm compares two sorted files, which is likely a more efficient way to compare files than doing set intersection.
Edit: Dave Ray is right, to avoid leaking open file handles it's better to do this:
user> (with-open [f (io/reader "foo.txt")]
(into #{} (line-seq f)))
#{"foo" "bar" "baz"}
I always read with slurp and after that split with re-seq due to my needs.