Specifying bindings syntax with :as - clojure

Say I have a binding such as:
loop [[x & more :as tree] tree, minVal 99, maxVal -99]
What does :as do? I have tried looking around but can't find anything about it.

There's an explicitly-on-point example in https://clojure.org/guides/destructuring --
(def word "Clojure")
(let [[x & remaining :as all] word]
(apply prn [x remaining all]))
;= \C (\l \o \j \u \r \e) "Clojure"
Here all is bound to the original structure (String, vector, list, whatever it may be) and x is bound to the character \C, and remaining is the remaining list of characters.
As this demonstrates, :as binds the original, pre-destructuring value, thus in the example of your loop making tree retain its original value. That's not particularly useful when that value already has a name, but is very useful when it itself is a return value or otherwise as-yet-unnamed.

Related

replace multiple bad characters in clojure

I am trying to replace bad characters from a input string.
Characters should be valid UTF-8 characters (tabs, line breaks etc. are ok).
However I was unable to figure out how to replace all found bad characters.
My solution works for the first bad character.
Usually there are none bad characters. 1/50 cases there is one bad character. I'd just want to make my solution foolproof.
(defn filter-to-utf-8-string
"Return only good utf-8 characters from the input."
[input]
(let [bad-characters (set (re-seq #"[^\p{L}\p{N}\s\p{P}\p{Sc}\+]+" input))
filtered-string (clojure.string/replace input (apply str (first bad-characters)) "")]
filtered-string))
How can I make replace work for all values in sequence not just for the first one?
Friend of mine helped me to find workaround for this problem:
I created a filter for replace using re-pattern.
Within let code is currently
filter (if (not (empty? bad-characters))
(re-pattern (str "[" (clojure.string/join bad-characters) "]"))
#"")
filtered-string (clojure.string/replace input filter "")
Here is a simple version:
(ns xxxxx
(:require
[clojure.string :as str]
))
(def all-chars (str/join (map char (range 32 80))))
(println all-chars)
(def char-L (str/join (re-seq #"[\p{L}]" all-chars)))
(println char-L)
(def char-N (str/join (re-seq #"[\p{N}]" all-chars)))
(println char-N)
(def char-LN (str/join (re-seq #"[\p{L}\p{N}]" all-chars)))
(println char-LN)
all-chars => " !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNO"
char-L => "ABCDEFGHIJKLMNO"
char-N => "0123456789"
char-LN => "0123456789ABCDEFGHIJKLMNO"
So we start off with all ascii chars in the range of 32-80. We first print only the letter, then only the numbers, then either letters or numbers. It seems this should work for your problem, although instead of rejecting non-members of the desired set, we keep the members of the desired set.

Is there a better way to do that in Clojure?

I have this function to read a file and convert it to a list of two-elements lists:
(def f1 "/usr/example")
(defn read-file [file]
(let [f
(with-open [rdr (clojure.java.io/reader file)]
(doall (map list (line-seq rdr))))]
(cond
(= file f1) (map #(map read-string (split (first %) #" ")) f)
:else (map #(map read-string (split (first %) #"\t")) f))))
I use cond to split the file correctly(I have two types of files, the first separates elements by spaces and the second, with tabs).
The first type of file would be like:
"1.3880896237218878E9 0.4758112837388654
1.3889631620596328E9 0.491845185928218"
while the second is:
'1.3880896237218878E9\t0.4758112837388654
1.3889631620596328E9\t0.491845185928218"
I get the result I want, for example:
((1.3880896237218878E9 0.4758112837388654) (1.3889631620596328E9 0.491845185928218))
But I wonder if there's a cleaner way to do that, maybe using less map functions or doing it without cond
This returns a vector of vectors, splitting individual lines on arbitrary whitespace and using Double/parseDouble to read in the individual doubles. What it doesn't handle are any single or double quote characters in the files; if they are part of the actual input, I suppose I'd just preprocess it with a regex to get rid of them (see below).
(require '[clojure.java.io :as io] '[clojure.string :as string])
(defn read-file [f]
(with-open [rdr (io/reader f)]
(mapv (fn [line]
(mapv #(Double/parseDouble %) (string/split line #"\s+")))
(line-seq rdr))))
As for the aforementioned preprocessing, you could use #(string/replace % #"['\"]" "") to remove all single quotes. That would be appropriate if they occur at the beginning and end of the input, or perhaps the individual lines. (If the individual numbers are quoted, then you need to make sure you're not removing all delimiters between them -- in such a case it may be better to replace with a single space and then use string/trim to remove any whitespace from the ends of the string.)

Processing a file character by character in Clojure

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.
(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
(let [seq (line-seq reader)]
(doseq [item seq]
(let [words (split item #"\s")]
(println words))))))
Given a file with this text input:
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
My output looks like this:
[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States. U.S. laws alone swamp our small staff.]
Though I would expect it to look like:
["international" "donations" "are" .... ]
So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?
Edit
As #deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to #dimagog's comment, char-seq not read all char into memory thanks to lazy-seq
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename). However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map or reduce or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...) to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words is: a vector of strings. When you print each words you are printing a vector. If at the end you replaced the line (println words) with (doseq [w words] (println w))), then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w) at the base of your expression, because this will print values and return nil. You would simply want w. Also, you would want to replace your doseqs with fors--again, to avoid return nil.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def") results in the sequence (\a \b \c \d \e \f).
mapcat is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcating the result of converting all of the line strings to a seq will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader)).
For other advice:
For creating the reader, I would recommend using the clojure.java.io/reader function instead of directly creating the classes.
Consider breaking apart the reading the file and the processing (in this case printing) of the strings from each other. While it is important to keep the full file parsing inside the withopen clause, being able to test the actual processing code outside of the file reading code is quite useful.
When navigating multiple (potentially nested) sequences consider using for. for does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).

What's the right way to create a Clojure function that returns a new sequence based on another sequence?

Why do you have to use "first" in get-word-ids, and what's the right way to do this?
(defn parts-of-speech []
(lazy-seq (. POS values)))
(defn index-words [pos]
(iterator-seq (. dict getIndexWordIterator pos)))
(defn word-ids [word]
(lazy-seq (. word getWordIDs)))
(defn get-word [word-id]
(. dict getWord word-id))
(defn get-index-words []
(lazy-seq (map index-words (parts-of-speech))))
(defn get-word-ids []
(lazy-seq (map word-ids (first (get-index-words)))))
;; this works, but why do you have to use "first" in get-word-ids?
(doseq [word-id (get-word-ids)]
(println word-id))
The short answer: remove all the references to lazy-seq.
as for your original question, it is worth explaining even if it's not a idomatic use of lazy-seq. you have to use first because the get-word-ids function is returning a lazy sequence with one entry. that entry is the lazy sequences you are looking for.
looks like this
( (word1 word2 word3) )
so first returns the sequence you want:
(word1 word2 word3)
It is very likely that the only time you will use lazy-seq will be in this pattern:
(lazy-seq (cons :something (function-call :produces :the :next :element)))
I have never seen lazy-seq used in any other pattern. The purpose of lazy-seq is to generate new sequences of original data. If code exists to produce the data then it's almost always better to use something like iterate map, or for to produce your lazy sequence.
This seems wrong:
(defn get-index-words []
(lazy-seq (map index-words (parts-of-speech))))
(index-words pos) returns a seq. which is why you need a (first) in get-word-ids.
also map is already lazy, so there's no need to wrap a (map ...) in a lazy-seq, and it would be almost pointless to use lazy-seq around map if map wasn't lazy. it would probably be useful if you'd read up a bit more on (lazy) sequences in clojure.

Iterating through a map with doseq

I'm new to Clojure and I'm doing some basic stuff from labrepl, now I want to write a function that will replace certain letters with other letters, for example: elosska → elößkä.
I wrote this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize
[sentence]
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(doseq [[original-letter new-letter] german-letters]
(str/replace sentence original-letter new-letter)))
but it doesn't work as I expect. Could you help me, please?
Here is my take,
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [s]
(reduce (fn[sentence [match replacement]]
(str/replace sentence match replacement)) s german-letters))
(germanize "elosska")
There are 2 problems here:
doseq doesn't preserve head of list that created by its evaluation, so you won't get any results
str/replace works on separate copies of text, producing 4 different results - you can check this by replacing doseq with for and you'll get list with 4 entries.
You code could be rewritten following way:
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [sentence]
(loop [text sentence
letters german-letters]
(if (empty? letters)
text
(let [[original-letter new-letter] (first letters)]
(recur (str/replace text original-letter new-letter)
(rest letters))))))
In this case, intermediate results are collected, so all replacements are applied to same string, producing correct string:
user> (germanize "elosska")
"elößkä"
P.S. it's also not recommended to use def in the function - it's better to use it for top-level forms
Alex has of course already correctly answered the question with respect to the original issue using doseq... but I found the question interesting and wanted to see what a more "functional" solution would look like. And by that I mean without using a loop.
I came up with this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize [sentence]
(let [letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"}
regex (re-pattern (apply str (interpose \| (keys letters))))]
(str/replace sentence regex letters)))
Which yields the same result:
student.dialect=> (germanize "elosska")
"elößkä"
The regex (re-pattern... line simply evaluates to #"ss|a|o|u", which would have been cleaner, and simpler to read, if entered as an explicit string, but I thought it best to have only one definition of the german letters.