Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?
You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))
OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.
Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.
Related
I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.
= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]
As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!
I'm trying to write a function that takes a string and returns a result of a filter function (I'm working through 4clojure problems). The result must be a string too.
I've written this:
(fn my-caps [s]
(filter #(Character/isUpperCase %) s))
(my-caps "HeLlO, WoRlD!")
Result: (\H \L \O \W \R \D)
Now I'm trying to create a string out of this list, using clojure.string/join, like this:
(fn my-caps [s]
(clojure.string/join (filter #(Character/isUpperCase %) s)))
The result is however the same. I've also tried using apply str, with no success.
You have to convert the lazy sequence returned by filter into a string, by applying the str function. Also, use defn to define a new function - here's how:
(defn my-caps [s]
(apply str (filter #(Character/isUpperCase %) s)))
It works as expected:
(my-caps "HeLlO, WoRlD!")
=> "HLOWRD"
The last code snippet you pasted works fine. join indeed does return a string.
Try this:
(defn my-caps [s]
(->> (filter #(Character/isUpperCase %) s)
(apply str)))
filter function returns a lazy sequence. If you want to get a string, you should transform the sequence to string by applying str function.
I'm trying to write a function to strip all ASCII vowels in Clojure. I am new to Clojure, and I'm having a little trouble with strings. For example the string "hello world" would return "hll wrld". I appreciate the help!
You can take advantage of the underlying functions on the string class for that.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
If that feels like cheating, you could turn the string into a seq, and then filter it with the complement of a set, and then turn that back into a string.
user=> (apply str (filter (complement #{\a \e \i \o \u}) (seq "hello world")))
"hll wrld"
Sets in clojure are also functions. complement takes a function and returns a function that returns the logical not of the original function. It's equivalent to this. apply takes a function and a bunch of arguments and calls that function with those arguments (roughly speaking).
user=> (apply str (filter #(not (#{\a \e \i \o \u} %)) (seq "hello world")))
"hll wrld"
edit
One more...
user=> (apply str (re-seq #"[^aeiou]" "hello world"))
"hll wrld"
#"[^aeiou]" is a regex, and re-seq turns the matches into a seq. It's clojure-like and seems to perform well. I might try this one before dropping down to Java. The ones that seq strings are quite a bit slower.
Important Edit
There's one more way, and that is to use clojure.string/replace. This may be the best way given that it should work in either Clojure or Clojurescript.
e.g.
dev:cljs.user=> (require '[clojure.string :as str])
nil
dev:cljs.user=> (str/replace "hello world" #"[aeiou]" "")
"hll wrld"
Bill is mostly right, but wrong enough to warrant this answer I think.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
This solution is perfectly acceptable. In fact, it's the best solution proposed. There is nothing wrong with dropping down to Java if the solution is the cleanest and fastest.
Another solution is, like he said, using sequence functions. However, his code is a little strange. Never use filter with (not ..) or complement. There is a function specifically for that, remove:
user> (apply str (remove #{\a \e \i \o \u} "hello world"))
"hll wrld"
You also don't have to call seq on the string. All of Clojure's seq functions will handle that for you.
His last solution is interesting, but I'd prefer the first one simply because it doesn't involve (apply str ..).
I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.
(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
(let [seq (line-seq reader)]
(doseq [item seq]
(let [words (split item #"\s")]
(println words))))))
Given a file with this text input:
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
My output looks like this:
[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States. U.S. laws alone swamp our small staff.]
Though I would expect it to look like:
["international" "donations" "are" .... ]
So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?
Edit
As #deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to #dimagog's comment, char-seq not read all char into memory thanks to lazy-seq
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename). However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map or reduce or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...) to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words is: a vector of strings. When you print each words you are printing a vector. If at the end you replaced the line (println words) with (doseq [w words] (println w))), then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w) at the base of your expression, because this will print values and return nil. You would simply want w. Also, you would want to replace your doseqs with fors--again, to avoid return nil.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def") results in the sequence (\a \b \c \d \e \f).
mapcat is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcating the result of converting all of the line strings to a seq will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader)).
For other advice:
For creating the reader, I would recommend using the clojure.java.io/reader function instead of directly creating the classes.
Consider breaking apart the reading the file and the processing (in this case printing) of the strings from each other. While it is important to keep the full file parsing inside the withopen clause, being able to test the actual processing code outside of the file reading code is quite useful.
When navigating multiple (potentially nested) sequences consider using for. for does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).
Why do you have to use "first" in get-word-ids, and what's the right way to do this?
(defn parts-of-speech []
(lazy-seq (. POS values)))
(defn index-words [pos]
(iterator-seq (. dict getIndexWordIterator pos)))
(defn word-ids [word]
(lazy-seq (. word getWordIDs)))
(defn get-word [word-id]
(. dict getWord word-id))
(defn get-index-words []
(lazy-seq (map index-words (parts-of-speech))))
(defn get-word-ids []
(lazy-seq (map word-ids (first (get-index-words)))))
;; this works, but why do you have to use "first" in get-word-ids?
(doseq [word-id (get-word-ids)]
(println word-id))
The short answer: remove all the references to lazy-seq.
as for your original question, it is worth explaining even if it's not a idomatic use of lazy-seq. you have to use first because the get-word-ids function is returning a lazy sequence with one entry. that entry is the lazy sequences you are looking for.
looks like this
( (word1 word2 word3) )
so first returns the sequence you want:
(word1 word2 word3)
It is very likely that the only time you will use lazy-seq will be in this pattern:
(lazy-seq (cons :something (function-call :produces :the :next :element)))
I have never seen lazy-seq used in any other pattern. The purpose of lazy-seq is to generate new sequences of original data. If code exists to produce the data then it's almost always better to use something like iterate map, or for to produce your lazy sequence.
This seems wrong:
(defn get-index-words []
(lazy-seq (map index-words (parts-of-speech))))
(index-words pos) returns a seq. which is why you need a (first) in get-word-ids.
also map is already lazy, so there's no need to wrap a (map ...) in a lazy-seq, and it would be almost pointless to use lazy-seq around map if map wasn't lazy. it would probably be useful if you'd read up a bit more on (lazy) sequences in clojure.