Empty string regular expression in emacs lisp - regex

I have this code to find empty strings in a region.
(defun replace-in-region (start end)
(interactive "r")
(let ((region-text (buffer-substring start end))
(temp nil))
(delete-region start end)
(setq temp (replace-regexp-in-string "\\_>" "X" region-text))
(insert temp)))
When I use it on a region it wipes it out, no matter the content of said region, and gives the error "Args out of range: 4, 4".
When I use query-replace-regexp in a region containing:
abcd abcd
abcd 11.11
Been the regexp \_> (note that there is only one backslash) and rep X the resulting region after 4 occurences are replaced is:
abcdX abcdX
abcdX 11.11X
What am I missing here?

It looks like a bug in replace-regexp-in-string.
It first match the regexp in the original string. For example, it finds the end of "abcd". It then picks out the substring that match and, for some reason unknown to me, redo the match on the substring. In this case, the match fails (as it no longer follows a word), but the code that follows it assumes that it succeeded and that the match data has been updated.
Please report this as a bug using M-x report-emacs-bug.
I would suggest that you replace the call to replace-regexp-in-string with a simple loop. In fact, I would recommend that you don't cut out the string and do something like the following:
(defun my-replace-in-region (start end)
(interactive "r")
(save-excursion
(goto-char start)
(setq end (copy-marker end))
(while (re-search-forward "\\_>" end t)
(insert "X")
;; Ensure that the regexp doesn't match the newly inserted
;; character.
(forward-char))))

Related

Combine multiple replace-regexp to a program

I am working on cleaning files with multiple regex-replacement
<<.*>> -> ""
\([[:alpha:]]\)\* -> \1 ;; patters as pragram* to program
\*\([[:alpha:]]\) -> \1 ;; patters as *program to program
\*/\([[:alpha:]]\) -> \1
;;and so on
On every single file, I have to invoke replace-regexp various times.
How could combine these regex search?
To an extent, M-x whitespace-cleanup has similar requirements, that is, cleanup base on multiple conditions. It should be possible to use (emacs) Keyboard Macros, but I am not familiar with it. Once you have some knowledge in Emacs Lisp, you can solve the problem easily, for example, the following cleanups leading and trailing spaces, you can add your regexp and their replacement into my-cleanup-regexps:
(defvar my-cleanup-regexps
'(("^ +" "")
(" +$" ""))
"A list of (REGEXP TO-STRING).")
(defun my-cleanup-replace-regexp (regexp to-string)
"Replace REGEXP with TO-STRING in the whole buffer."
(goto-char (point-min))
(while (re-search-forward regexp nil t)
(replace-match to-string)))
(defun my-cleanup ()
"Cleanup the whole buffer according to `my-cleanup-regexps'."
(interactive)
(dolist (r my-cleanup-regexps)
(apply #'my-cleanup-replace-regexp r)))

mit-scheme multiple regex matches in string

I'm using MIT/GNU Scheme 9.2. If I define a string:
(define a-string "00:00 11:11 22:22")
I can match and get the first time a pattern appears:
(re-match-extract a-string
(re-string-match
"..:.." a-string) 0)
;Value 3: "00:00"
That's great, but I want to match the other times "..:.." appears. It seems like there should be some simple way, but am I missing something? Do I need to write a recursive function that matches the first pattern then cuts it off and runs the match on the rest of the string until it's exhausted?
What I would like to end up with is a list that looks like:
("00:00" "11:11" "22:22")
At some point between when 9.2 was current and now (11.2 is the most recent version as of this writing), the regular expression support of MIT-Scheme was overhauled, and the questions' re-XXX functions no longer exist. Instead, it supports some basic matching functions and SRFI-115 regular expressions, both using different variations of a s-expression based syntax instead of traditional stringy REs. So this is really a "How to get a list of all matches using SRFI-115" answer.
The key here is regexp-fold, which invokes a function for each non-overlapping match in a string:
(define (all-matches re str)
(regexp-fold re
(lambda (i match s matches)
(cons (regexp-match-submatch match 0) matches))
'()
str
(lambda (i match s matches) (reverse! matches))))
;;; Returns: ("00:00" "11:11" "22:22")
(all-matches (rx any any #\: any any) "00:00 11:11 22:22")

how do I define an emacs replace-regex shortcut command?

I don't understand how to re-use an interactive command in a command I'm writing myself.
I want to make a command that always uses the same arguments to replace-regexp. It's a shortcut, really.
So I tried to mimic in a function what I'd done interactively on a selected region, namely:
M-x replace-regexp RET ^\(\s *\)\(.*\)\s *$ RET \1 + '\2'
I mimicked it by writing this function:
(defun myH2js ()
"Converts html to an (incomplete) JavaScript String concatenation."
(interactive)
(let (p1 p2)
(setq p1 "^\(\s *\)\(.*\)\s *$" )
(setq p2 "\1 + '\2'" )
(replace-regexp p1 p2 )
)
)
But my function "replaces zero occurrences" of the selected region whereas my interaction rewrites everything exactly as I want.
What am I doing wrong?
You need to double the backslashes in the strings, because backslash is both the string and regular expression escape character:
(defun myH2js (start end)
"Converts html to an (incomplete) JavaScript String concatenation."
(interactive "r")
(let ((p1 "^\\(\\s *\\)\\(.*\\)\\s *$")
(p2 "\\1 + '\\2'"))
(replace-regexp p1 p2 nil start end)
)
)
Note that replace-regexp is not recommended for use inside programs; the online documentation says:
This function is usually the wrong thing to use in a Lisp program.
What you probably want is a loop like this:
(while (re-search-forward regexp nil t)
(replace-match to-string nil nil))
which will run faster and will not set the mark or print anything.

Regex search in Emacs for line not starting with semicolon

I am trying to search forward in the current buffer for the first elisp function definition that is not a comment. I tried this:
(re-search-forward "[^;] *\(defun ")
but it matches comments anyway. Like the following line:
;; (defun test ()
You can use:
(catch 'found
(let (match)
(while (setq match (search-forward "(defun "))
(when (null (syntax-ppss-context (syntax-ppss)))
(throw 'found match)))
nil))
It relies on the internal parser and the language syntax definition. It returns only the result of search-forward if point is not in a comment and not in a string.
If you do not like the error in the case of a search without hits you can add nil t to the arguments of the search-forward command. Search with re-search-forward is also fine.
This also works for cases like:
(defun test (args)
"This function is defined with (defun test (args) ...)"
)
The space in (defun test () actually matches [^;]. Since you have *, another space is not needed. You may want to use [^;] +. However, you can use a negative lookbehind via
\(?<!;;)
This seems to work nicely for me (tested in *scratch*):
(re-search-forward "^ *\(defun " nil t) ; hit C-x C-e after that
; closing parenthesis
;; (defun hi () )
(defun hola () )

regexp for elisp

In Emacs I would like to write some regexp that does the following:
First, return a list of all dictionary words that can be formed within "hex space". By this I mean:
#000000 - #ffffff
so #00baba would be a word (that can be looked up in the dictionary)
so would #baba00
and #abba00
and #0faded
...where trailing and leading 0's are considered irrelevant. How would I write this? Is my question clear enough?
Second, I would like to generate a list of words that can be made using numbers as letters:
0 = o
1 = i
3 = e
4 = a
...and so on. How would I write this?
First, load your dictionary. I'll assume that you're using /var/share/dict/words, which is nearly always installed by default when you're running Linux. It lists one word per line, which is a very handy format for this sort of thing.
Next run M-x keep-lines. It'll ask you for a regular expression and then delete any line that doesn't match it. Use the regex ^[a-f]\{,6\}$ and it will filter out anything that can't be part of a color.
Specifically, the ^ makes the regex start at the beginning of the line, the [a-f] matches any one character that is between a and f (inclusive), the {,6} lets it match between 0 and 6 instances of the previous item (in this case the character class [a-f] and finally the $ tells it that the next thing must be the end of the line.
This will return a list of all instances of #000000 - #ffffff in the buffer, although this pattern may not be restrictive enough for your purposes.
(let ((my-colour-list nil))
(save-excursion
(goto-char (point-min))
(while (re-search-forward "#[0-9a-fA-F]\\{6\\}" nil t)
(add-to-list 'my-colour-list (match-string-no-properties 0)))
my-colour-list))
I'm not actually certain that this is what you were asking for. What do you mean by "dictionary"?
A form that will return you a hash table with all the elements you specify in it could be this:
(let ((hash-table (make-hash-table :test 'equal)))
(dotimes (i (exp 256 3))
(puthash (concat "#" (format "%06x" i)) t hash-table))
hash-table)
I'm not sure how Emacs will manage that size of elements (16 million). As you don't want the 0, you can generate the space without that format, and removing trailing 0's. I don't know what do you want to do with the rest of the numbers. You can write the function step by step like this then:
(defun find-hex-space ()
(let (return-list)
(dotimes (i (exp 256 3))
(let* ((hex-number (strip-zeros (format "%x" i)))
(found-word (gethash hex-number *dictionary*)))
(if found-word (push found-word return-list))))
return-list))
Function strip-zeros is easy to write, and here I suppose your words are in a hash called *dictionary*. strip-zeros could be something like this:
(defun strip-zeros (string)
(let ((sm (string-match "^0*\\(.*?\\)0*$" string)))
(if sm (match-string 1 string) string)))
I don't quite understand your second question. The words would be also using the hex space? Would you then consider only the words formed by numbers, or would also include the letters in the word?