In Emacs I would like to write some regexp that does the following:
First, return a list of all dictionary words that can be formed within "hex space". By this I mean:
#000000 - #ffffff
so #00baba would be a word (that can be looked up in the dictionary)
so would #baba00
and #abba00
and #0faded
...where trailing and leading 0's are considered irrelevant. How would I write this? Is my question clear enough?
Second, I would like to generate a list of words that can be made using numbers as letters:
0 = o
1 = i
3 = e
4 = a
...and so on. How would I write this?
First, load your dictionary. I'll assume that you're using /var/share/dict/words, which is nearly always installed by default when you're running Linux. It lists one word per line, which is a very handy format for this sort of thing.
Next run M-x keep-lines. It'll ask you for a regular expression and then delete any line that doesn't match it. Use the regex ^[a-f]\{,6\}$ and it will filter out anything that can't be part of a color.
Specifically, the ^ makes the regex start at the beginning of the line, the [a-f] matches any one character that is between a and f (inclusive), the {,6} lets it match between 0 and 6 instances of the previous item (in this case the character class [a-f] and finally the $ tells it that the next thing must be the end of the line.
This will return a list of all instances of #000000 - #ffffff in the buffer, although this pattern may not be restrictive enough for your purposes.
(let ((my-colour-list nil))
(save-excursion
(goto-char (point-min))
(while (re-search-forward "#[0-9a-fA-F]\\{6\\}" nil t)
(add-to-list 'my-colour-list (match-string-no-properties 0)))
my-colour-list))
I'm not actually certain that this is what you were asking for. What do you mean by "dictionary"?
A form that will return you a hash table with all the elements you specify in it could be this:
(let ((hash-table (make-hash-table :test 'equal)))
(dotimes (i (exp 256 3))
(puthash (concat "#" (format "%06x" i)) t hash-table))
hash-table)
I'm not sure how Emacs will manage that size of elements (16 million). As you don't want the 0, you can generate the space without that format, and removing trailing 0's. I don't know what do you want to do with the rest of the numbers. You can write the function step by step like this then:
(defun find-hex-space ()
(let (return-list)
(dotimes (i (exp 256 3))
(let* ((hex-number (strip-zeros (format "%x" i)))
(found-word (gethash hex-number *dictionary*)))
(if found-word (push found-word return-list))))
return-list))
Function strip-zeros is easy to write, and here I suppose your words are in a hash called *dictionary*. strip-zeros could be something like this:
(defun strip-zeros (string)
(let ((sm (string-match "^0*\\(.*?\\)0*$" string)))
(if sm (match-string 1 string) string)))
I don't quite understand your second question. The words would be also using the hex space? Would you then consider only the words formed by numbers, or would also include the letters in the word?
Related
I am trying to write a function (char-count) which takes a pattern and a string, then returns a number (count) which represents how many times any of the characters in the pattern appear in the string.
For example:
(char-count "Bb" "Best buy")
would return 2 since there is 1 match for B and 1 match for b, so added together we get 2
(char-count "AaR" "A Tale of Recursion")
would return 3 and so on
I tried using re-seq in my function, but it seems to work only for continuous strings. As in (re-seq #Bb "Best Buy) only looks for the pattern Bb, not for each individual character.
This is what my function looks like so far:
(defn char-count [pattern text]
(count (re-seq (#(pattern)) text)))
But it does not do what I want. Can anybody help?
P.s. Very new to clojure (and functional programming in general).
You don't need anything nearly as powerful as a regular expression here, so just use the simple tools your programming language comes with: sets and functions. Build a set of the characters you want to find, and count how many characters from the input string are in the set.
(defn char-count [chars s]
(count (filter (set chars) s)))
Try wrapping the characters in [...] within the RegEx:
(count (re-seq #"[Bb]" "Best buy"))
Or, since you need that pattern to be dynamic:
(count (re-seq (re-pattern (str "[" pattern "]")) text))
But note that the solution might not work properly if the pattern contains special RegEx characters such as [, ], \, -, ^ - you'd have to escape them by prepending \\ in front of each one.
I am new to Emacs. I can search for text and show all lines in a separate buffer using "M-x occur". I can also search for multiple text items using OR operator as : one\|two , which will find lines with "one" or "two" (as explained on Emacs occur mode search for multiple strings). How can I search for lines with both "one" and "two"? I tried using \& and \&& but they do not work. Will I need to create a macro or function for this?
Edit:
I tried writing a function for above in Racket (a Scheme derivative). Following works:
#lang racket
(define text '("this is line number one"
"this line contains two keyword"
"this line has both one and two keywords"
"this line contains neither"
"another two & one words line"))
(define (srch . lst) ; takes variable number of arguments
(for ((i lst))
(set! text (filter (λ (x) (string-contains? x i)) text)))
text)
(srch "one" "two")
Ouput:
'("this line has both one and two keywords" "another two & one words line")
But how can I put this in Emacs Lisp?
Regex doesn't support "and" because it has very limited usefulness and weird semantics when you try to use it in any nontrivial regex. The usual fix is to just search for one.*two\|two.*one ... or in the case of *Occur* maybe just search for one and then M-x delete-non-matching-lines two.
(You have to mark the *Occur* buffer as writable before you can do this. read-only-mode is a toggle; the default keybinding is C-x C-q. At least in my Emacs, you have to move the cursor away from the first line or you'll get "Text is read-only".)
(defun occur2 (regex1 regex2)
"Search for lines matching both REGEX1 and REGEX2 by way of `occur'.
We first (occur regex1) and then do (delete-non-matching-lines regex2) in the
*Occur* buffer."
(interactive "sFirst term: \nsSecond term: ")
(occur regex1)
(save-excursion
(other-window 1)
(let ((buffer-read-only nil))
(forward-line 1)
(delete-non-matching-lines regex2))))
The save-excursion and other-window is a bit of a wart but it seemed easier than hardcoding the name of the *Occur* buffer (which won't always be true; you can have several occur buffers) or switching there just to fetch the buffer name, then Doing the Right Thing with set-buffer etc.
I have this code to find empty strings in a region.
(defun replace-in-region (start end)
(interactive "r")
(let ((region-text (buffer-substring start end))
(temp nil))
(delete-region start end)
(setq temp (replace-regexp-in-string "\\_>" "X" region-text))
(insert temp)))
When I use it on a region it wipes it out, no matter the content of said region, and gives the error "Args out of range: 4, 4".
When I use query-replace-regexp in a region containing:
abcd abcd
abcd 11.11
Been the regexp \_> (note that there is only one backslash) and rep X the resulting region after 4 occurences are replaced is:
abcdX abcdX
abcdX 11.11X
What am I missing here?
It looks like a bug in replace-regexp-in-string.
It first match the regexp in the original string. For example, it finds the end of "abcd". It then picks out the substring that match and, for some reason unknown to me, redo the match on the substring. In this case, the match fails (as it no longer follows a word), but the code that follows it assumes that it succeeded and that the match data has been updated.
Please report this as a bug using M-x report-emacs-bug.
I would suggest that you replace the call to replace-regexp-in-string with a simple loop. In fact, I would recommend that you don't cut out the string and do something like the following:
(defun my-replace-in-region (start end)
(interactive "r")
(save-excursion
(goto-char start)
(setq end (copy-marker end))
(while (re-search-forward "\\_>" end t)
(insert "X")
;; Ensure that the regexp doesn't match the newly inserted
;; character.
(forward-char))))
For example, I have a string abcdefg. * , how can I create a regexp [abcdefg\. *] that can match each character in the string? The problem is that there could be special characters such as . in the string.
A simple and robust solution is to use the built-in regexp-opt function, which takes a list of fixed strings and returns an efficient regex to match any one of them. Then all you need to do is split your original string into one-character segments:
(regexp-opt
(mapcar #'char-to-string
(string-to-list "abcdefg. *"))) ; => "[ *.a-g]"
Use the regexp-quote function.
(setq regexp (concat "[" (regexp-quote string) "]"));
Note that most regexp characters don't have special meaning inside square brackets, so they don't need to be quoted. Here is the Emacs documentation on including certain special characters inside a character set:
Note that the usual regexp special characters are not special inside
a character set. A completely different set of special characters
exists inside character sets: ']', '-' and '^'.
To include a ']' in a character set, you must make it the first
character. For example, '[]a]' matches ']' or 'a'. To include a
'-', write '-' as the first or last character of the set, or put it
after a range. Thus, '[]-]' matches both ']' and '-'.
To include '^' in a set, put it anywhere but at the beginning of the
set. (At the beginning, it complements the set--see below.)
(defun partition (string test &rest more-tests)
(loop with hash = (make-hash-table)
for c across string do
(loop for f in (cons test more-tests)
for i from 1 do
(when (funcall f c)
(setf (gethash i hash) (cons c (gethash i hash)))
(return))
finally (setf (gethash 0 hash) (cons c (gethash 0 hash))))
finally (return (loop for v being the hash-values of hash
collect (coerce v 'string)))))
(defun regexp-quote-charclass (input)
(destructuring-bind (safe dangerous)
(partition input (lambda (x) (member x '(?\\ ?\] ?^ ?- ?:))))
(concat "[" (remove-duplicates safe)
(let ((dangerous (coerce (remove-duplicates dangerous) 'list))
(printed safe))
(with-output-to-string
(when (member ?\\ dangerous)
(setf printed t)
(princ "\\\\"))
(when (member ?: dangerous)
(setf printed t)
(princ "\\:"))
(when (member ?\] dangerous)
(setf printed t)
(princ "\\]"))
(when (member ?^ dangerous)
(if printed (princ "^") (princ "\\^")))
(when (member ?\- dangerous) (princ "-")))) "]")))
This seems like it would do the job. Also, to my best knowledge, you don't need to escape the characters which have meaning outside the character class, such as ?[ or ?$ etc. However, I've added ?: because in a very rare case it could get confused to things like [:alpha:] (you cannot obtain this exact string through this function, but I'm not sure of how Emacs will parse the [: combination, so just to be sure.
I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.
(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
(let [seq (line-seq reader)]
(doseq [item seq]
(let [words (split item #"\s")]
(println words))))))
Given a file with this text input:
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
My output looks like this:
[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States. U.S. laws alone swamp our small staff.]
Though I would expect it to look like:
["international" "donations" "are" .... ]
So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?
Edit
As #deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to #dimagog's comment, char-seq not read all char into memory thanks to lazy-seq
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename). However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map or reduce or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...) to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words is: a vector of strings. When you print each words you are printing a vector. If at the end you replaced the line (println words) with (doseq [w words] (println w))), then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w) at the base of your expression, because this will print values and return nil. You would simply want w. Also, you would want to replace your doseqs with fors--again, to avoid return nil.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def") results in the sequence (\a \b \c \d \e \f).
mapcat is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcating the result of converting all of the line strings to a seq will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader)).
For other advice:
For creating the reader, I would recommend using the clojure.java.io/reader function instead of directly creating the classes.
Consider breaking apart the reading the file and the processing (in this case printing) of the strings from each other. While it is important to keep the full file parsing inside the withopen clause, being able to test the actual processing code outside of the file reading code is quite useful.
When navigating multiple (potentially nested) sequences consider using for. for does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).