Stripping Vowels in clojure - clojure

I'm trying to write a function to strip all ASCII vowels in Clojure. I am new to Clojure, and I'm having a little trouble with strings. For example the string "hello world" would return "hll wrld". I appreciate the help!

You can take advantage of the underlying functions on the string class for that.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
If that feels like cheating, you could turn the string into a seq, and then filter it with the complement of a set, and then turn that back into a string.
user=> (apply str (filter (complement #{\a \e \i \o \u}) (seq "hello world")))
"hll wrld"
Sets in clojure are also functions. complement takes a function and returns a function that returns the logical not of the original function. It's equivalent to this. apply takes a function and a bunch of arguments and calls that function with those arguments (roughly speaking).
user=> (apply str (filter #(not (#{\a \e \i \o \u} %)) (seq "hello world")))
"hll wrld"
edit
One more...
user=> (apply str (re-seq #"[^aeiou]" "hello world"))
"hll wrld"
#"[^aeiou]" is a regex, and re-seq turns the matches into a seq. It's clojure-like and seems to perform well. I might try this one before dropping down to Java. The ones that seq strings are quite a bit slower.
Important Edit
There's one more way, and that is to use clojure.string/replace. This may be the best way given that it should work in either Clojure or Clojurescript.
e.g.
dev:cljs.user=> (require '[clojure.string :as str])
nil
dev:cljs.user=> (str/replace "hello world" #"[aeiou]" "")
"hll wrld"

Bill is mostly right, but wrong enough to warrant this answer I think.
user=> (.replaceAll "hello world" "[aeiou]" "")
"hll wrld"
This solution is perfectly acceptable. In fact, it's the best solution proposed. There is nothing wrong with dropping down to Java if the solution is the cleanest and fastest.
Another solution is, like he said, using sequence functions. However, his code is a little strange. Never use filter with (not ..) or complement. There is a function specifically for that, remove:
user> (apply str (remove #{\a \e \i \o \u} "hello world"))
"hll wrld"
You also don't have to call seq on the string. All of Clojure's seq functions will handle that for you.
His last solution is interesting, but I'd prefer the first one simply because it doesn't involve (apply str ..).

Related

Convert set to regex pattern in clojure

If I have this set
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
How can I turn it to this pattern string:
"foo\.clj|bar\.clj|baz\.clj"
My attempt : 
(defn set->pattern-str [coll]
(-> (clojure.string/join "|" coll)
(clojure.string/replace #"\." "\\\\.")))
(set->pattern-str my-set)
=> "foo\\.clj|baz\\.clj|bar\\.clj" ;I get the double backslash
Better ideas?
In case your set of strings might have other metacharacters than just . in them, a more general approach is to ask the underlying java.util.regex.Pattern implementation to escape everything for us:
(import 'java.util.regex.Pattern)
(defn set->pattern-str [coll]
(->> coll
(map #(Pattern/quote %))
(clojure.string/join \|)
re-pattern))
IDEone link here. Remember, IDEone is not a REPL, and you have to tell it to put values on stdout with e.g. println before you can see them.
You were close to the final solution. Double backslash is displayed because it is shown escaped. When you turn it into a seq you will see individual characters:
(seq "foo\\.clj")
;;=> (\f \o \o \\ \. \c \l \j)
And working solution:
(def my-set #{"foo.clj" "bar.clj" "baz.clj"})
(def my-set-pattern
(-> (clojure.string/join "|" my-set)
(clojure.string/replace "." "\\.")
(re-pattern)))
(re-matches my-set-pattern "foo.clj")
;;=> "foo.clj"
(re-matches my-set-pattern "bar.clj")
;;=> "bar.clj"
(re-matches my-set-pattern "baz.clj")
;;=> "baz.clj"
(re-matches my-set-pattern "foo-clj")
;;=> nil
Edit: OK, this one does in fact work. Probably want to break it apart a bit more if it's meant to be long lived code, but this is the simplest way I could find to do it with minimal string munging.
(defn is-matching-file-name [target-string]
(re-matches
(re-pattern (clojure.string/escape (String/join "|" my-set) {\. "\\."}))
target-string))
The clojure.string/escape here takes two arguments: the string to escape, and a mapping of the characters to escape to the replacement strings. The key in this map is the literal \. and the value needs two backslashes since we want to include one backslash preceding any . in the final string to be used as the argument for the re-pattern function.

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?
user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")
Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.
While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

Processing a file character by character in Clojure

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.
(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
(let [seq (line-seq reader)]
(doseq [item seq]
(let [words (split item #"\s")]
(println words))))))
Given a file with this text input:
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
My output looks like this:
[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States. U.S. laws alone swamp our small staff.]
Though I would expect it to look like:
["international" "donations" "are" .... ]
So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?
Edit
As #deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to #dimagog's comment, char-seq not read all char into memory thanks to lazy-seq
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename). However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map or reduce or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...) to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words is: a vector of strings. When you print each words you are printing a vector. If at the end you replaced the line (println words) with (doseq [w words] (println w))), then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w) at the base of your expression, because this will print values and return nil. You would simply want w. Also, you would want to replace your doseqs with fors--again, to avoid return nil.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def") results in the sequence (\a \b \c \d \e \f).
mapcat is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcating the result of converting all of the line strings to a seq will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader)).
For other advice:
For creating the reader, I would recommend using the clojure.java.io/reader function instead of directly creating the classes.
Consider breaking apart the reading the file and the processing (in this case printing) of the strings from each other. While it is important to keep the full file parsing inside the withopen clause, being able to test the actual processing code outside of the file reading code is quite useful.
When navigating multiple (potentially nested) sequences consider using for. for does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).

Escaping brackets in Clojure

If I try this
(import java.util.regex.Pattern)
(Pattern/compile ")!##$%^&*()")
or this
(def p #")!##$%^&*()")
I have Clojure complaining that there is an unmatched / unclosed ). Why are brackets evaluated within this simple string? How to escape them? Thanks
EDIT: While escaping works in the clojure-specific syntax (#""), it doesn't work with the Pattern/compile syntax that I do need because I have to compile the regex patter dynamically from a string.
I've tried with re-pattern, but I can't escape properly for some reason:
(re-pattern "\)!##$%^&*\(\)")
java.lang.Exception: Unsupported escape character: \)
java.lang.Exception: Unable to resolve symbol: ! in this context (NO_SOURCE_FILE:0)
java.lang.Exception: No dispatch macro for: $
java.lang.Exception: Unable to resolve symbol: % in this context (NO_SOURCE_FILE:0)
java.lang.IllegalArgumentException: Metadata can only be applied to IMetas
EDIT 2 This little function may help:
(defn escape-all [x]
(str "\\" (reduce #(str %1 "\\" %2) x)))
I got it working by double escaping everything. Oh the joys of double escaping.
=> (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
=> #"\)\!\#\#\$\%\^\&\*\(\)"
=> (re-find (re-pattern "\\)\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)")
")!##$%^&*()")
=> ")!##$%^&*()"
I would recommend writing a helper function str-to-pattern (or whatever you want to call it), that takes a string, double escapes everything it needs to, and then calls re-pattern on it.
Edit: making a string to pattern function
There are plenty of ways to do this, below is just one example. I start by making an smap of regex escape chars to their string replacement. An "smap" isn't an actual type, but functionally it's a map we will use to swap "old values" with "new values", where "old values" are members of the keys of the smap, and "new values" are corresponding members of the vals of smap. In our case, this smap looks like {\( "\\(", \) "\\)" ...}.
(def regex-char-esc-smap
(let [esc-chars "()*&^%$#!"]
(zipmap esc-chars
(map #(str "\\" %) esc-chars))))
Next is the actual function. I use the above smap to replace items in the string passed to it, then convert that back into a string and make a regex pattern out of it. I think the ->> macro makes the code more readable, but that's just a personal preference.
(defn str-to-pattern
[string]
(->> string
(replace regex-char-esc-smap)
(reduce str)
re-pattern))
are you sure the error is from the reader (ie from clojure itself)?
regexps use parentheses, and they have to match there too. i would guess the error is cominng from the code trying to compile the regexp.
if you want to escape a paren in a regexp, use a backquote: (def p #"\)!##$%^&*\(\)")
[update] ah, sorry, you probably need double escapes as Omri days.
All of the versions of Java that Clojure supports recognize \Q to start a quoted region and \E to end the quoted region. This allows you to do something like this:
(re-find #"\Q)!##$%^&*()\E" ")!##$%^&*()")
If you're using (re-pattern) then this will work:
(re-find (re-pattern "\\Q)!##$%^&*()\\E") ")!##$%^&*()")
If you're assembling a regular expression from a string whose content you don't know then you can use the quote method in java.util.regex.Pattern:
(re-find (re-pattern (java.util.regex.Pattern/quote some-str)) some-other-str)
Here's an example of this from my REPL:
user> (def the-string ")!##$%^&*()")
#'user/the-string
user> (re-find (re-pattern (java.util.regex.Pattern/quote the-string)) the-string)
")!##$%^&*()"

Reverse a string (simple question)

Is there a better way to do this in Clojure?
daniel=> (reverse "Hello")
(\o \l \l \e \H)
daniel=> (apply str (vec (reverse "Hello")))
"olleH"
Do you have to do the apply $ str $ vec bit every time you want to reverse a string back to its original form?
You'd better use clojure.string/reverse:
user=> (require '[clojure.string :as s])
nil
user=> (s/reverse "Hello")
"olleH"
UPDATE: for the curious, here follow the source code snippets for clojure.string/reverse in both Clojure (v1.4) and ClojureScript
; clojure:
(defn ^String reverse
"Returns s with its characters reversed."
{:added "1.2"}
[^CharSequence s]
(.toString (.reverse (StringBuilder. s))))
; clojurescript
(defn reverse
"Returns s with its characters reversed."
[s]
(.. s (split "") (reverse) (join "")))
OK, so it would be easy to roll your own function with apply inside, or use a dedicated version of reverse that works better (but only) at strings. The main things to think about here though, is the arity (amount and type of parameters) of the str function, and the fact that reverse works on a collection.
(doc reverse)
clojure.core/reverse
([coll])
Returns a seq of the items in coll in reverse order. Not lazy.
This means that reverse not only works on strings, but also on all other collections. However, because reverse expects a collection as parameter, it treats a string as a collection of characters
(reverse "Hello")
and returns one as well
(\o \l \l \e \H)
Now if we just substitute the functions for the collection, you can spot the difference:
(str '(\o \l \l \e \H) )
"(\\o \\l \\l \\e \\H)"
while
(str \o \l \l \e \H )
"olleH"
The big difference between the two is the amount of parameters. In the first example, str takes one parameter, a collection of 5 characters. In the second, str takes 5 parameters: 5 characters.
What does the str function expect ?
(doc str)
-------------------------
clojure.core/str
([] [x] [x & ys])
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So when you give in one parameter (a collection), all str returns is a toString of the collection.
But to get the result you want, you need to feed the 5 characters as separate parameters to str, instead of the collection itself. Apply is the function that is used to 'get inside' the collection and make that happen.
(apply str '(\o \l \l \e \H) )
"olleH"
Functions that handle multiple separate parameters are often seen in Clojure, so it's good to realise when and why you need to use apply. The other side to realize is, why did the writer of the str function made it accept multiple parameters instead of a collection ? Usually, there's a pretty good reason. What's the prevalent use case for the str function ? Not concatenating a collection of separate characters surely, but concatenating values, strings and function results.
(let [a 1 b 2]
(str a "+" b "=" (+ a b)))
"1+2=3"
What if we had a str that accepted a single collection as parameter ?
(defn str2
[seq]
(apply str seq)
)
(str2 (reverse "Hello"))
"olleH"
Cool, it works ! But now:
(let [a 1 b 2]
(str2 '(a "+" b "=" (+ a b)))
)
"a+b=(+ a b)"
Hmmm, now how to solve that ? :)
In this case, making str accept multiple parameters that are evaluated before the str function is executed gives str the easiest syntax. Whenever you need to use str on a collection, apply is a simple way to convert a collection to separate parameters.
Making a str that accepts a collection and have it evaluate each part inside would take more effort, help out only in less common use cases, result in more complicated code or syntax, or limit it's applicability. So there might be a better way to reverse strings, but reverse, apply and str are best at what they do.
Apply, like reverse, works on any seqable type, not just vectors, so
(apply str (reverse "Hello"))
is a little shorter. clojure.string/reverse should be more efficient, though.