How to put the records of my file into a defined list and use it after in my code? - CLOJURE - clojure

(defn loadData [filename]
(with-open [rdr (io/reader filename)]
(doseq [line (line-seq rdr)
:let [ [id & data]
(clojure.string/split line #"\|")] ]
[
(Integer/parseInt id)
data
])
))
My file's content:
1|Xxx|info1|222-2222
2|Yyy|info2|333-3333
3|Zzz|info3|444-4444
How do I get this function to return a list like :
{
{1, Xxx, info1, 222-2222}
{2, Yyy, info2, 333-3333}
{3, Zzz, info3, 444-4444}
}
I also want to know how to get my function loadData to return this list above instead of NIL.
Please help! Thanks in advance...

(def my-file-content
(.getBytes "1|Xxx|info1|222-2222\n2|Yyy|info2|333-3333\n3|Zzz|info3|444-4444"))
;; See https://guide.clojure.style/ for common naming style.
(defn load-data
[filename]
;; Well done, `with-open` is correctly used here!
(with-open [rdr (io/reader filename)]
;; See https://clojuredocs.org/clojure.core/doseq for examples of `for` and `doseq`. After reading examples and documentation, it will probably become clear why `for` is what you want, and not `doseq`.
(for [line (line-seq rdr)
:let [[id & data] (clojure.string/split line #"\|")]]
[(Integer/parseInt id)
data])))
;; But `(load-data my-file-content)` raises an exception because `for` returns a lazy sequence. Nothing happens before you actually display the result, so the course of actions is:
;; - Open a reader;
;; - Define computations in a `LazySeq` as result of `for`;
;; - Close the reader;
;; - The function returns the lazy seq;
;; - Your REPL wants to display it, so it tries to evaluate the first elements;
;; - Now it actually tries to read from the reader, but it is closed.
;; This lazyness explains why this returns no exception (you don't look at what's inside the sequence):
(type (load-data my-file-content))
;; => clojure.lang.LazySeq
;; The shortest fix is:
(defn load-data
[filename]
(with-open [rdr (io/reader filename)]
;; https://clojuredocs.org/clojure.core/doall
(doall
(for [line (line-seq rdr)
:let [[id & data] (clojure.string/split line #"\|")]]
[(Integer/parseInt id)
data]))))
(load-data my-file-content)
;; => ([1 ("Xxx" "info1" "222-2222")] [2 ("Yyy" "info2" "333-3333")] [3 ("Zzz" "info3" "444-4444")])
;; This is not what you want. Here is the shortest fix to coerce this (realised) lazy sequence into what you want:
(defn load-data
[filename]
(with-open [rdr (io/reader filename)]
;; https://clojuredocs.org/clojure.core/doall
(doall
(for [line (line-seq rdr)
:let [[id & data] (clojure.string/split line #"\|")]]
(cons (Integer/parseInt id) data)))))
(load-data my-file-content)
;; => '((1 "Xxx" "info1" "222-2222") (2 "Yyy" "info2" "333-3333") (3 "Zzz" "info3" "444-4444"))
;; Also, note that a list is represented as `'(1 2 3)` or `[1 2 3]` for a vector. `{1 2 3}` is a syntax error, curly brackets are for maps.
(defn load-data
[filename]
(with-open [rdr (io/reader filename)]
(->> (line-seq rdr)
(mapv #(clojure.string/split % #"\|")))))
(load-data my-file-content)
;; => [["1" "Xxx" "info1" "222-2222"] ["2" "Yyy" "info2" "333-3333"] ["3" "Zzz" "info3" "444-4444"]]
;; Note that here we no longer have a list of lists, but a vector of vectors. In Clojure lists are lazy and vectors are not, so this is why `mapv`, which returns a vector, works fine with an open reader.

Related

clojure java.lang.NullPointerException while spliting string

I am new to clojure. I am trying to write a program which reads data from a file (comma seperated file) after reading the data I am trying to split each line while delimiter "," but I am facing the below error:
CompilerException java.lang.NullPointerException,
compiling:(com\clojure\apps\StudentRanks.clj:26:5)
Here is my code:
(ns com.clojure.apps.StudentRanks)
(require '[clojure.string :as str])
(defn student []
(def dataset (atom []))
(def myList (atom ()))
(def studObj (atom ()))
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(swap! dataset into (reduce conj [] (line-seq rdr)))
)
(println #dataset)
(def studentCount (count #dataset))
(def ind (atom 0))
(loop [n studentCount]
(when (>= n 0)
(swap! myList conj (get #dataset n))
(println (get #dataset n))
(recur (dec n))))
(println myList)
(def scount (count #dataset))
(loop [m scount]
(when (>= m 0)
(def data(get #dataset m))
(println (str/split data #","))
(recur (dec m))))
)
(student)
Thanks in advance.
As pointed out in the comments, the first problem is that you are not writing correct Clojure.
To start, def should never be nested -- it's not going to behave like you hope. Use let to introduce local variables (usually just called locals because it's weird to call variables things that don't vary).
Second, block-like constructs (such as do, let or with-open evaluates to the value of their last expression.
So this snippet
(def dataset (atom []))
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(swap! dataset into (reduce conj [] (line-seq rdr))))
should be written
(let [dataset
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(into [] (line-seq rdr)))]
; code using dataset goes here
)
Then you try to convert dataset (a vector) to a list (myList) by traversing it backwards and consing on the list under construction. It's not needed. You can get a sequence (list-like) out of a vector by just calling seq on it. (Or rseq if you want the list to be reversed.)
Last, you iterate once again to split and print each item held in dataset. Explicit iteration with indices is pretty unusual in Clojure, prefer reduce, doseq, into etc.
Here are two ways to write student:
(defn student [] ; just for print
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(doseq [data (line-seq rdr)]
(println (str/split data #",")))))
(defn student [] ; to return a value
(with-open [rdr (clojure.java.io/reader "e:\\example.txt")]
(into []
(for [data (line-seq rdr)]
(str/split data #",")))))
I hope this will help you to better get Clojure.
I suggest you use a csv library:
(require '[clojure.data.csv :as csv])
(csv/read-csv (slurp "example.txt"))
Unless this is some file io exercise.

counting lines in a file with a filter with clojure

I'm trying to figure out what is wrong with my code here. Basically the idea behind it is that I am reading a very large file and at the end of each line in the file is a number. I want to count the number of lines that have the number at the end greater than 500.
What I have is this and on paper it should work, but something is going wrong and I keep returning nil.
(defn countlines [] (with-open [rdr (clojure.java.io/reader "myfile.txt")]
(doseq [line (line-seq rdr)]
(count (re-find #"(?!500)[56789]\d{2,}|\d{4,}$" line)))))
the reason is that you use doseq:
clojure.core/doseq
[seq-exprs & body]
Macro
Added in 1.0
Repeatedly executes body (presumably for side-effects) with
bindings and filtering as provided by "for". Does not retain
the head of the sequence. Returns nil.
you should probably rewrite it to something like (doall (for [line (line-seq rdr)] ...
but to fulfill your task you need to rewrite it (because your function would return a seq of counts of chars in matches:
user> (count (re-find #"\d+" "123k456"))
3
which is obviously not what you want
what you need to do is:
(count (filter #(re-find #"(?!500)[56789]\d{2,}|\d{4,}$" %)
(line-seq rdr)))
If I understand the question correctly, you should be doing something like this:
(defn countlines [] (with-open [rdr (clojure.java.io/reader "myfile.txt")]
(-> (line-seq rdr)
(filter #(re-find #"(?!500)[56789]\d{2,}|\d{4,}$" %))
(count))))
About Martin Lechner's answer, I think should use Thread last(->>) rather than Thread first(->). So it should be
(defn countlines [] (with-open [rdr (clojure.java.io/reader "myfile.txt")] (->> (line-seq rdr)
(filter #(re-find #"(?!500)[56789]\d{2,}|\d{4,}$" %))
(count))))

Read file until certain line in Clojure using doseq

This would normally be trivial in other language, but I've found no such example in Clojure.
I can println an entire file using:
(with-open [rdr (io/reader "file")]
(doseq [line (line-seq rdr) :while (< count(line) 10)]
(println line)))
But how do I get it to stop at line 5?
Thanks.
You can try this:
(println
(with-open [rdr (clojure.java.io/reader "file")]
(let [ls (line-seq rdr)]
(doall (take 5 ls)))))
This will print first 5 lines of the specified file.
If you need skip some lines that does not satisfy the condition, you can add filter. The following code will print first five lines that the length is less than 10.
(println
(with-open [rdr (clojure.java.io/reader "file")]
(let [ls (line-seq rdr)]
(->> ls
(filter #(< (count %) 10))
(take 5)
(doall)))))
Since filter and take returns lazy sequence, it should be realized within the with-open form. Outside the with-open form, the sequence couldn't be realized and cause exception.
println function also make the sequence realized, you can modify the code like this:
(with-open [rdr (clojure.java.io/reader "data/base_exp.txt")]
(let [ls (line-seq rdr)]
(->> ls
(filter #(> (count %) 10))
(take 5)
(println))))
Simply use take to limit the amount of lines:
Replace
(doseq [line (line-seq rdr) ;; ...
with
(doseq [line (take 5 (line-seq rdr)) ;; ...

How i can deserialize record structure from file, already saved to file with print-dup?

I'm have a following code:
(use 'clojure.java.io)
(defrecord Member [id name salary role])
(defrecord Role [id name])
(def member-records (ref ()))
(defn add-member [member]
(dosync (alter member-records conj member)))
;;Test-data -->
(def dev-r(->Role 1 "Developer"))
(def test-member1(->Member 1 "Kirill" 70000.00 dev-r))
;;Test-data <--
(defn save-data-2-file []
(with-open [wrtr (writer "C:/Platform/Work/test.cdf")]
(print-dup #member-records wrtr)))
(defn process-line [line]
(println line))
;;Test line content
;;#BTC.pcost.Member{:id 1, :name "Kirill", :salary 70000.0, :role #BTC.pcost.Role{:id 1, :name "Developer"}})
(defn load-data-from-file []
(with-open [rdr (reader "C:/Platform/Work/test.cdf")]
(doseq [line (line-seq rdr)]
(process-line line))))
I'm want to recreate records after reading file, but i can not understand how i can make it. Yes, i'm know that i can parse text and fill my structure by the elements of parsed line, but it's will be difficult, cause i'm have alot structs like "Member" and "Role". Can anyone to suggest me a way, that i can do?
You can use read-string, and slurp, to pull the records out of the file. read-string is limited to reading the first form of a string, but, from your sample, you are only storing a single form, as a list of records.
(defn load-data-from-file [file]
(read-string (slurp file)))
Lazy Reading
If you need more than the first form, or cannot read the entire stream into memory, you can use read directly, to make a lazy reader.
(defn lazy-read
([rdr] (let [eof (Object.)] (lazy-read rdr (read rdr false eof) eof)))
([rdr data eof]
(if (not= eof data)
(cons data (lazy-seq (lazy-read rdr (read rdr false eof) eof))))))
(defn load-all-data [file]
(with-open [rdr (java.io.PushbackReader. (reader file))]
(doall (lazy-read rdr))))
(load-all-data "C:/Platform/Work/test.cdf")
Security
Also, it is good to mention security when loading code with read-string or read. You should only use them with trusted sources, because, using #= or a Java constructor, the source can execute arbitrary code inside your application. For a longer explanation, take a look at the documentation for read.
Setting *read-eval* to false would prevent the issue, but it would also prevent the reconstruction of the records in your sample. To avoid the issue all together, you can use the clojure.edn/read and clojure.edn/read-string functions, with a whitelist of readers.
(defn edn-read [eof rdr]
(clojure.edn/read {:eof eof :readers {'BTC.pcost.Role map->Role
'BTC.pcost.Member map->Member}}
rdr))
(defn lazy-edn-read
([rdr] (let [eof (Object.)] (lazy-edn-read rdr (edn-read eof rdr) eof)))
([rdr data eof]
(if (not= eof data)
(cons data (lazy-seq (lazy-edn-read rdr (edn-read eof rdr) eof))))))
(defn load-all-data [file]
(with-open [rdr (java.io.PushbackReader. (reader file))]
(doall (take-while (complement nil?) (lazy-edn-read rdr)))))
(load-all-data "C:/Platform/Work/test.cdf")
You can use read.
This function will read one object from a file:
(defn load-data-from-file [filename]
(with-open [rdr (java.io.PushbackReader. (reader filename))]
(read rdr)))
Or this will read all objects from the file:
(defn load-all-data-from-file [filename]
(let [eof (Object.)]
(with-open [rdr (java.io.PushbackReader. (reader filename))]
(doall
(take-while #(not= % eof)
(repeatedly #(read rdr nil eof)))))))
Here's the API documentation for read.
This is a small variation that will read all objects from a string:
(defn load-all-data-from-string [string]
(let [eof (Object.)]
(with-open [rdr (-> string java.io.StringReader. java.io.PushbackReader.)]
(doall
(take-while #(not= % eof)
(repeatedly #(read rdr nil eof)))))))
This is, as far as I know, not possible to do using read-string. Instead we use read with a java.io.StringReader.

clojure read large text file and count occurrences

I'm trying to read a large text file and count occurrences of specific errors.
For example, for the following sample text
something
bla
error123
foo
test
error123
line
junk
error55
more
stuff
I want to end up with (don't really care what data structure although I am thinking a map)
error123 - 2
error55 - 1
Here is what I have tried so far
(require '[clojure.java.io :as io])
(defn find-error [line]
(if (re-find #"error" line)
line))
(defn read-big-file [func, filename]
(with-open [rdr (io/reader filename)]
(doall (map func (line-seq rdr)))))
calling it like this
(read-big-file find-error "sample.txt")
returns:
(nil nil "error123" nil nil "error123" nil nil "error55" nil nil)
Next I tried to remove the nil values and group like items
(group-by identity (remove #(= nil %) (read-big-file find-error "sample.txt")))
which returns
{"error123" ["error123" "error123"], "error55" ["error55"]}
This is getting close to the desired output, although it may not be efficient. How can I get the counts now? Also,as someone new to clojure and functional programming I would appreciate any suggestions on how I might improve this.
thanks!
I think you might be looking for the frequencies function:
user=> (doc frequencies)
-------------------------
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times
they appear.
nil
So, this should give you what you want:
(frequencies (remove nil? (read-big-file find-error "sample.txt")))
;;=> {"error123" 2, "error55" 1}
If your text file is really large, however, I would recommend doing this on the line-seq inline to ensure you don't run out of memory. This way you can also use a filter rather than map and remove.
(defn count-lines [pred, filename]
(with-open [rdr (io/reader filename)]
(frequencies (filter pred (line-seq rdr)))))
(defn is-error-line? [line]
(re-find #"error" line))
(count-lines is-error-line? "sample.txt")
;; => {"error123" 2, "error55" 1}