I have written data like that in a file (kind of)
{:a 25 :b 28}
{:a 2 :b 50}
...
I want to have a lazy sequence of these maps.
There are around 40 millions of lines. I can also write chunks of 10000 but I do not thnik it will change the way the functions are written (mapcat instead of map)
To read it, I wrote
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string affectations))
The problem is that Clojure tells
Don't know how to create ISeq from : java.io.BufferedReader
To be honest I understand nothing on the java.io namespace.
I would like to have a lazy sequence of the data in the file but I do not know how to turn the stream into strings and then collections.
Any idea ?
Is this read-line ?
Thanks
You are passing java.io.BufferedReader to map whereas map expects a seq.
You need to use line-seq to produce a (lazy) seq of lines from your file:
(with-open [affectations (io/reader "dev-resources/affectations.edn")]
(map read-string (lazy-seq affectations)))
Remember, that you need to force all your side effects on the data read from a resource opened in with-open within its scope, otherwise you will get errors.
One option is to just force the whole seq of text lines from your files and return it using doall. However, this solution could read all your data into memory which doesn't seem practical.
I guess you need to execute some logic for each of the line from the file and you don't need to keep all those parsed collections in memory. In such case you could pass a function representing that logic into your function handling reading your file:
(defn process-file [filename process-fn]
(with-open [reader (io/reader filename)]
(doseq [line (line-seq reader)]
(-> line
(read-string)
(process-fn)))))
This function will read your file line by line converting each of it individually using read-string and calling your process-fn function. process-file will return nil.
Related
I am a beginner in clojure. I am trying to solve this simple problem on codechef using clojure. Below is my clojure code but this code is taking too long to run and gives TimeoutException. Can someone please help me to optimize this code and make it run faster.
(defn checkCase [str]
(let [len (count str)]
(and (> len 1) (re-matches #"[A-Z]+" str))))
(println (count (filter checkCase (.split (read-line) " "))))
Note: My program is not getting timedout due to input error. On codechef input is handled automatically (probably through input redirection. Please read the question for more details)
Thank you!
Most text finding exercises are exercizes in regexps, this one no different. It's usually pretty hard to find a more efficient way in whatever programming language that will outpace good regexp implementations.
In this case re-seq, look around regexps, repetition limiting and the multiline regexp flag (?m) are your friends
(defn find-acronyms
[s]
(re-seq #"(?m)(?<=\W|^)[A-Z]+(?=\W|$)" s))
(find-acronyms "I like coding and will participate in IOI Then there is ICPC")
=> ("IOI" "ICPC")
Let's dissect the regex:
(?m) The multiline flag: lets you match your regex over multiple lines, so no need to split into multiple strings
(?<=\W|^) The match should follow a non-word character or the beginning of the (multiline) string
[A-Z]{2,} Match concurrent capital letters, a minimum of 2
(?=\W|$) The match should be followed by a non-word character or the end of the (multiline) string
I can only guess that wherever you run this snippet of code, it doesn't feed anything to your read-line invocation. Or maybe it does, but doesn't send a newline as the last thing. So it hangs waiting.
(defn checkCase [str]
(let [len (count str)]
(and (> len 1) (re-matches #"[A-Z]+" str))))
(defn answer [str]
(println (count (filter checkCase (.split str " ")))))
So at the REPL:
=> (answer "GGG fff TTT")
;-> 2
;-> nil
The answer is being printed to the screen. But probably best to have your function return the answer rather than print it out:
(defn answer [str]
(count (filter checkCase (.split str " "))))
All I have done is replaced your (read-line) with an argument. (read-line) is expecting input from stdin and waiting for it forever - or until a timeout happens in your case.
I am not sure if this is the slow part of your code, but if it is your could try to split up the execution and safe gard the very slow regexp part by executing it when it is necessary. I think the current version with AND already does that. If it does not you can try to do something else, like this:
(defn checkCase [^String str]
(cond
(< (.length str) 2)
false
(re-matches #"[A-Z]+" str)
true
:else
false))
maybe you could try using re-seq instead of spltting the string and checking every item? So you will lose the filter, .split, and additional function call. Something like this:
(println (count (re-seq #"\b[A-Z]{2,}?\b" (read-line))))
You need to submit a Java program. You can test it on the command line before you submit it. You can but don't need to use redirection symbols (<,>). Just type the input and see that every time you do it returns the count after you have typed enter.
You will need aot compilation (Ahead Of Time, which means that .class files are included) and a main that is exported. Only then will it become a Java program.
Actually when they ask for a Java program they probably mean a .class file. You can run a .class file with the java program (which I imagine is what their test-runner does). Put it in a shell or batch file when testing, but just submit the .class file.
Background
I've written a hack for Emacs that lets me send a Clojure form from an editor buffer to a REPL buffer. It's working fine, except that if the two buffers are in different namespaces the copied text doesn't usually make sense, or, worse, it might make sense but have a different meaning to that in the editor buffer.
I want to transform the text so that it makes sense in the REPL buffer.
A Solution in Common Lisp
In Common Lisp, I could do this using the following function:
;; Common Lisp
(defun translate-text-between-packages (text from-package to-package)
(let* ((*package* from-package)
(form (read-from-string text))
(*package* to-package))
(with-output-to-string (*standard-output*)
(pprint form))))
And a sample use:
;; Common Lisp
(make-package 'editor-package)
(make-package 'repl-package)
(defvar repl-package::a)
(translate-text-between-packages "(+ repl-package::a b)"
(find-package 'editor-package)
(find-package 'repl-package))
;; => "(+ A EDITOR-PACKAGE::B)"
The package name qualifications in the input string and the output string are different—exactly what's needed to solve the problem of copying and pasting text between packages.
(BTW, there's stuff about how to run the translation code in the Common Lisp process and move stuff between the Emacs world and the Common Lisp world, but I'm ok with that and I don't particularly want to get into it here.)
A Non-Solution in Clojure
Here's a direct translation into Clojure:
;; Clojure
(defn translate-text-between-namespaces [text from-ns to-ns]
(let [*ns* from-ns
form (read-string text)
*ns* to-ns]
(with-out-str
(clojure.pprint/pprint form))))
And a sample use:
;; Clojure
(create-ns 'editor-ns)
(create-ns 'repl-ns)
(translate-text-between-namespaces "(+ repl-ns/a b)"
(find-ns 'editor-ns)
(find-ns 'repl-ns))
;; => "(+ repl-ns/a b)"
So the translation function in Clojure has done nothing. That's because symbols and packages/namespaces in Common Lisp and Clojure work differently.
In Common Lisp symbols belong to a package and the determination of a symbol's package happens at read time.
In Clojure, for good reasons, symbols do not belong to a namespace and the determination of a symbol's namespace happens at evaluation time.
Can This Be Done in Clojure?
So, finally, my question: Can I convert Clojure code from one namespace to another?
I don't understand your use case, but here is a way to transform symbols from one namespace to another.
(require 'clojure.walk 'clojure.pprint)
(defn ns-trans-form [ns1 ns2 form]
(clojure.walk/prewalk
(fn [f] (if ((every-pred symbol? #(= (namespace %) ns1)) f)
(symbol ns2 (name f))
f))
form))
(defn ns-trans-text [ns1 ns2 text]
(with-out-str
(->> text
read-string
(ns-trans-form ns1 ns2)
clojure.pprint/pprint)))
(print (ns-trans-text "editor-ns" "repl-ns" "(+ editor-ns/a b)" ))
;=> (+ repl-ns/a b)
So, editor-ns/a was transformed to repl-ns/a.
(Answering my own question...)
Given that it's not easy to refer to a namespace's non-public vars from outside the namespace, there's no simple way to do this.
Perhaps a hack is possible, based on the idea at http://christophermaier.name/blog/2011/04/30/not-so-private-clojure-functions. That would involve walking the form and creating new symbols that resolve to new vars that have the same value as vars referred to in the original form. Perhaps I'll investigate this further sometime, but not right now.
I am parsing a big csv file and I am using the first line of it as the keys for the records. So for a csv file like:
header1,header2
foo,bar
zoo,zip
I end up with a lazy seq like:
({:header1 "foo" :header2 "bar"},
{:header1 "zoo" :header2 "zip"})
The code working fine, but I am not sure if in the following function I am holding the head of "lines" or not.
(defn csv-as-seq [file]
(let [rdr (clojure.java.io/reader file)]
(let [lines (line-seq rdr)
headers (parse-headers (first lines))]
(map (row-mapper headers) (rest lines)))))
Can somebody please clarify?
Yes, this expression syntactically says to hold the head
(let [lines (line-seq rdr)
though in this case you should get away with it because their are no references to
lines and headers after the call to map and the Clojure compiler starting with 1.2.x includes a feature called locals clearing: it sets any locals not used after a function call to nil in the preamble to the function call. In this case it will set lines and headers to nil in the local context of the function and they will be GCd as used. This is one of the rare cases where clojure produces bytecode that cannot be expressed in java.
I'm learning Clojure and as an exercise I wanted to write something like the unix "comm" command.
To do this, I read the contents of each file into a set, then use difference/intersection to show exclusive/common files.
After a lot of repl-time I came up with something like this for the set creation part:
(def contents (ref #{}))
(doseq [line (read-lines "/tmp/a.txt")]
(dosync (ref-set contents (conj #contents line))))
(I'm using duck-streams/read-lines to seq the contents of the file).
This is my first stab at any kind of functional programming or lisp/Clojure. For instance, I couldn't understand why, when I did a conj on the set, the set was still empty. This lead me to learning about refs.
Is there a better Clojure/functional way to do this? By using ref-set, am I just twisting the code to a non-functional mindset or is my code along the lines of how it should be done?
Is there a a library that already does this? This seems like a relatively ordinary thing to want to do but I couldn't find anything like it.
Clojure 1.3:
user> (require '[clojure.java [io :as io]])
nil
user> (line-seq (io/reader "foo.txt"))
("foo" "bar" "baz")
user> (into #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
line-seq gives you a lazy sequence where each item in the sequence is a line in the file.
into dumps it all into a set. To do what you were trying to do (add each item one by one into a set), rather than doseq and refs, you could do:
user> (reduce conj #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
Note that the Unix comm compares two sorted files, which is likely a more efficient way to compare files than doing set intersection.
Edit: Dave Ray is right, to avoid leaking open file handles it's better to do this:
user> (with-open [f (io/reader "foo.txt")]
(into #{} (line-seq f)))
#{"foo" "bar" "baz"}
I always read with slurp and after that split with re-seq due to my needs.
What would be an ideomatic way in Clojure to get a lazy sequence over a file containing float values serialized from Java? (I've toyed with a with-open approach based on line-reading examples but cannot seem to connect the dots to process the stream as floats.)
Thanks.
(defn float-seqs [#^java.io.DataInputStream dis]
(lazy-seq
(try
(cons (.readFloat dis) (float-seqs dis))
(catch java.io.EOFException e
(.close dis)))))
(with-open [dis (-> file java.io.FileInputStream. java.io.DataInputStream.)]
(let [s (float-seqs dis)]
(doseq [f s]
(println f))))
You are not required to use with-open if you are sure you are going to consume the whole seq.
If you use with-open, double-check that you're not leaking the seq (or a derived seq) outside of its scope.