Clojure- comparing a line with a string - clojure

(def filename "dictionary2.txt")
(defn check_word [filename word]
(with-open [r (clojure.java.io/reader filename)]
(doseq [line (line-seq r)]
(if (compare line word)
(println word)))))
(check_word filename "wizard")
It prints "found" as many as the number of lines in the text file. Why is if statement always returning true? Word of "wizard" does exist in the dictionary file.

According to the documentation the compare function returns a negative number, zero, or a positive number depending on the order of its parameters. Numbers are considered truthy values, so they always make the then branch of a conditional expression execute. The only falsey values in Clojure are nil and false.
If you want to check that line equals word you can use just equality with (= line word).

Related

How to correctly check if a string is equal to another string in Clojure?

I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.
= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]
As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!

How to get the output of `doc` function in clojure?

I tried to save the output of the doc function, eg:
user> (def doc-str-split (doc str/split))
-------------------------
clojure.string/split
([s re] [s re limit])
Splits string on a regular expression. Optional argument limit is
the maximum number of splits. Not lazy. Returns vector of the splits.
#'user/doc-str-split
user> doc-str-split
nil
user>
However, I got nil for the doc-str-split. I tried to get the type of the doc output:
user> (type (doc str/split))
-------------------------
clojure.string/split
([s re] [s re limit])
Splits string on a regular expression. Optional argument limit is
the maximum number of splits. Not lazy. Returns vector of the splits.
nil
Still get a nil. How to save the output of the doc function?
You can use with-out-str to capture the output, like so:
user> (def doc-str-split (with-out-str (doc str/split)))
#'user/doc-str-split
user> (println doc-str-split)
-------------------------
clojure.string/split
([s re] [s re limit])
Splits string on a regular expression. Optional argument limit is
the maximum number of splits. Not lazy. Returns vector of the splits.
nil
user> (type (with-out-str (doc str/split)))
java.lang.String
You can get just the doc string like so:
user.core=> (def doc (prn (:doc (meta #'clojure.string/split))))
"Splits string on a regular expression. Optional argument limit is
the maximum number of splits. Not lazy. Returns vector of the splits."

Clojure issue, can't convert list of strings to numbers

For the code below I'm reading input from stdin. Basically it's just some numbers delimited by spaces or line breaks. Specifically I'm trying to complete this challenge.
My goal is to create a list of numbers (without the first number) from the input. When I run the code below at hackerrank I get a list of a single number: (5)
Not sure what's going on, or how to fix. Would anyone know?
(map read-string (rest (line-seq (java.io.BufferedReader. *in*))))
line-seq gives one string for each line. read-string reads from a string, returning the first complete object found. Thus, you only get the first item on the line.
You could either us clojure.string/split to break up the string and use read-string on each part, or loop, accumulating the results of calling read on a PushbackReader made from the BufferedReader until you get no more input.
Since your input is
Input Format
The first line contains a single integer N.
The next line contains N integers: a0, a1,...aN-1 separated by space...
Sample Input
6
5 4 4 2 2 8
And you don't need to worry about validation / security, you can just
(let [n (read-string (read-line))
v (read-string (str "[" (read-line) "]"))]
(assert (== n (count v))) ;if you like
(comment solution here...))

Is there a better way to do that in Clojure?

I have this function to read a file and convert it to a list of two-elements lists:
(def f1 "/usr/example")
(defn read-file [file]
(let [f
(with-open [rdr (clojure.java.io/reader file)]
(doall (map list (line-seq rdr))))]
(cond
(= file f1) (map #(map read-string (split (first %) #" ")) f)
:else (map #(map read-string (split (first %) #"\t")) f))))
I use cond to split the file correctly(I have two types of files, the first separates elements by spaces and the second, with tabs).
The first type of file would be like:
"1.3880896237218878E9 0.4758112837388654
1.3889631620596328E9 0.491845185928218"
while the second is:
'1.3880896237218878E9\t0.4758112837388654
1.3889631620596328E9\t0.491845185928218"
I get the result I want, for example:
((1.3880896237218878E9 0.4758112837388654) (1.3889631620596328E9 0.491845185928218))
But I wonder if there's a cleaner way to do that, maybe using less map functions or doing it without cond
This returns a vector of vectors, splitting individual lines on arbitrary whitespace and using Double/parseDouble to read in the individual doubles. What it doesn't handle are any single or double quote characters in the files; if they are part of the actual input, I suppose I'd just preprocess it with a regex to get rid of them (see below).
(require '[clojure.java.io :as io] '[clojure.string :as string])
(defn read-file [f]
(with-open [rdr (io/reader f)]
(mapv (fn [line]
(mapv #(Double/parseDouble %) (string/split line #"\s+")))
(line-seq rdr))))
As for the aforementioned preprocessing, you could use #(string/replace % #"['\"]" "") to remove all single quotes. That would be appropriate if they occur at the beginning and end of the input, or perhaps the individual lines. (If the individual numbers are quoted, then you need to make sure you're not removing all delimiters between them -- in such a case it may be better to replace with a single space and then use string/trim to remove any whitespace from the ends of the string.)

Processing a file character by character in Clojure

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.
(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
(let [seq (line-seq reader)]
(doseq [item seq]
(let [words (split item #"\s")]
(println words))))))
Given a file with this text input:
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
My output looks like this:
[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States. U.S. laws alone swamp our small staff.]
Though I would expect it to look like:
["international" "donations" "are" .... ]
So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?
Edit
As #deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to #dimagog's comment, char-seq not read all char into memory thanks to lazy-seq
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename). However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map or reduce or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...) to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words is: a vector of strings. When you print each words you are printing a vector. If at the end you replaced the line (println words) with (doseq [w words] (println w))), then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w) at the base of your expression, because this will print values and return nil. You would simply want w. Also, you would want to replace your doseqs with fors--again, to avoid return nil.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def") results in the sequence (\a \b \c \d \e \f).
mapcat is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcating the result of converting all of the line strings to a seq will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader)).
For other advice:
For creating the reader, I would recommend using the clojure.java.io/reader function instead of directly creating the classes.
Consider breaking apart the reading the file and the processing (in this case printing) of the strings from each other. While it is important to keep the full file parsing inside the withopen clause, being able to test the actual processing code outside of the file reading code is quite useful.
When navigating multiple (potentially nested) sequences consider using for. for does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).