manipulate regexp-search matches using `query-regexp-replace` in a defun - regex

Since version 22 of Emacs, we can use \,(function) for manipualting (parts of) the regex-search result before replacing it. But – this is mentioned often, but nonetheless still the truth – we can use this construct only in the standard interactive way. (Interactive like: By pressing C-M-% or calling query-replace-regexp with M-x.)
As an example:
If we have
[Foo Bar 1900]
and want to get
[Foo Bar \function{foo1900}{1900}]
we can use:
M-x query-replace-regexp <return>
\[\([A-Za-z-]+\)\([^0-9]*\) \([0-9]\{4\}\)\]
[\1\2 \\function{\,(downcase \1)\3}{\3}]
to get it done. So this can be done pretty easy.
In my own defun, I can use query only by replacing without freely modifying the match, or modify the prepared replaced string without any querying. The only way I see, is to serialize it in such a way:
(defun form-to-function ()
(interactive)
(goto-char (point-min))
(while (query-replace-regexp
"\\[\\([A-Za-z-]+\\)\\([^0-9]*\\) \\([0-9]\\{4\\}\\)\\]"
"[\\1\\2 \\\\function{\\1\\3}{\\3}]" ))
(goto-char (point-min))
(while (search-forward-regexp "\\([a-z0-9]\\)" nil t)
(replace-match (downcase (match-string 1)) t nil)
)
)
For me the query is important, because I can't be sure, what the buffer offers me (= I can't be sure, the author used this kind of string always in the same manner).
I want to use an elisp function, because it is not the only recurring replacement (and also not only one buffer (I know about dired-do-query-replace-regexp but I prefer working buffer-by-buffer with replace-defuns)).
At first I thought I only miss something like a query-replace-match to use instead of replace-match. But I fear, I am also missing the easy and flexible way of rearrange the string the the query-replace-regexp.
So I think, I need a \, for use in an defun. And I really wonder, if I am the only one, who is missing this feature.

If you want your rsearch&replace to prompt the user, that means you want it to be interactive, so it's perfectly OK to call query-replace-regexp (even if the byte-compiler will tell you that this is meant for interactive use only). If the warning bothers you, you can either wrap the call in with-no-warnings or call perform-replace instead.
The docstring of perform-replace sadly doesn't (or rather "didn't" until today) say what is the format of the replacements argument, but you can see it in the function's code:
;; REPLACEMENTS is either a string, a list of strings, or a cons cell
;; containing a function and its first argument. The function is
;; called to generate each replacement like this:
;; (funcall (car replacements) (cdr replacements) replace-count)
;; It must return a string.

The query-replace-function can handle replacement not only as a string, but as a list including the manipulating elements. The use of concat archives building an string from various elements.
So one who wants to manipulate the search match by a function before inserting the replacement can use query-replace-regexp also in a defun.
(defun form-to-function ()
(interactive)
(goto-char (point-min))
(query-replace-regexp
"\\[\\([A-Za-z-]+\\)\\([^0-9]*\\) \\([0-9]\\{4\\}\\)\\]"
(quote (replace-eval-replacement concat "[\\1\\2 \\\\function{"
(replace-quote (downcase (match-string 1))) "\\3}{\\3}]")) nil ))
match-string 1 returns the first expression of our regexp-search.
`replace-quote' helps us doublequoting the following expression.
concat forms a string from the following elements.
and
replace-eval-replacement is not documented.
Why it is in use here nevertheless, is because of emacs seems to use it internally, while performing the first »interactive« query-replace-regexp call. At least is it given by asking emacs with repeat-complex-command.
I came across repeat-complex-command (bound to [C-x M-:].) while searching for an answer in the source code of query-replace-regexp.
So an easy to create defun could be archieved by performing the standard search and replace way as told in the question and after first sucess pressing [C-x M-:] results in an already Lisp formed command, which can be copied and pasted in a defun.
Edit (perform-replace)
As Stefan mentioned, one can use perform-replace to avoid using query-replace-regexp.
Such a function could be:
(defun form-to-function ()
(interactive)
(goto-char (point-min))
(while (perform-replace
"\\[\\([A-Za-z-]+\\)\\([^0-9]*\\) \\([0-9]\\{4\\}\\)\\]"
(quote (replace-eval-replacement concat "[\\1\\2 \\\\function{"
(replace-quote (downcase (match-string 1))) "\\3}{\\3}]"))
t t nil)))
The first boolean (t) is a query flag, the second is the regexp switch. So it works also perfectly, but it didn't help finding the replacement expression as easy as in using \,.

Related

emacs how to refresh buffer and preserve highlighting - especially for logs

I am often parsing log files, which I help visualize by highlighting specific char sequences via M-s h r + regex, aka the command highlight-regexp or hi-lock-face-buffer. Sometimes I will need to regenerate the log file, which is done via the C-x C-v or find-alternate-file command. Unfortunately, the regeneration also loses all of my highlighting.
TLDR
Is there a way to regenerate my buffer file while maintaining all of the highlighting?
Update
I am using text-mode, and the regexes vary widely depending on my task and the log file. I would like something that carries over all of my highlighting to the refreshed buffer. Does something already exist?
Answer:
Even better than maintaining highlights after a refresh, the selected answer uses an add-hook to create a highlighting scheme for a particular log. Additionally, each log can have different highlight schemes. The result is an automated highlighting scheme that matches a particular log file's name, with the ability to toggle the schemes of various highlights.
You can add-hook that:
(add-hook 'text-mode-hook
(lambda ()
(font-lock-add-keywords nil '(("^\\([^,]*\\)," 1 'font-lock-function-name-face)))))
What regexp you'd like to highlight? If You are using several regexps depending on the log, it might be possible to hook them all. Or you might program the condition: which regexp to hightlight:
(add-hook 'text-mode-hook
(lambda ()
(if (condition)
;; condition true:
(font-lock-add-keywords nil '((regexp1 1 'font-lock-function-name-face)))
;; condition false:
(font-lock-add-keywords nil '((regexp2 1 'font-lock-function-name-face)))
)
))
Also if you control the generation of logs you can generate the hightlight rules in the first line:
# -*- eval: (highlight-regexp REGEXP) -*-
(assuming # is the comment char in your log).
Edit:
Here's defun which toggles hightlighting for TestQueryLogic or invoking fork-join and testGudermann:
(defun testing-MapAppLog.txt ()
"Toggle highlighting `TestQueryLogic' or `invoking fork-join' and `testGudermann'."
(interactive)
(cond
((get this-command 'state)
(highlight-regexp "TestQueryLogic" font-lock-variable-name-face)
(highlight-regexp "invoking fork-join" font-lock-type-face)
(unhighlight-regexp "testGudermann")
(message "Highlighting: TestQueryLogic, invoking fork-join")
(put this-command 'state nil))
(t
(unhighlight-regexp "TestQueryLogic")
(unhighlight-regexp "invoking fork-join")
(highlight-regexp "testGudermann" font-lock-preprocessor-face)
(message "Highlighting: testGudermann")
(put this-command 'state t))))
So call it once to hightlight TestQueryLogic, invoking fork-join, call it again to hightlight testGudermann instead. You can bind to a key:
(define-key text-mode-map (kbd "M-s t") 'testing-MapAppLog.txt)
and press that once or twice depending on what you'd like to highlight.
Rather than C-x C-v which basically says "throw away current buffer and use thise other file instead" an has hence no hope of preserving transient info such as your highlighting patterns, you want to use revert-buffer, or maybe even auto-revert-mode or auto-revert-tail-mode.

Custom character classes in Emacs' regexp

Is it possible to add own character classes to emacs in order to use them in regular expressions?
Let's say, i want to add a class [[:consonant:]] which matches all letters that are not vowels in order to avoid writing [b-df-hj-np-tv-z] all the time (and yes, i am aware that my shortcut is almost as long as the term i want to avoid, take it as a simplification of my problem).
Is this possible at all or do i have to use format or concat, respectively? If it is possible, how do i do that?
An MWE could be like this:
(defun myfun ()
"Finds clusters of three or more consonants"
(interactive)
(if (search-forward-regexp "[b-df-hj-np-tv-z]\\{3,\\}")
(message "Yepp, here is a consonant cluster.")
))
(defun myfun-1 ()
"Should also find clusters of three or more consonants."
(interactive)
(if (search-forward-regexp "[[:consonant:]]\\{3,\\}")
(message "Yepp, here is a consonant cluster.")
))
Both functions myfun and myfun-1 should do the very same thing.
One step further i'd like to know if it is possible to put whole expressions in such "shortcuts", like
[[:ending:]] ==> "\\(?:en\\|st\\|t\\|e\\)"
Jordon-Biondo is correct, you cannot extend Emacs' character classes for regexp search. If you peek into Emacs' source code, you can see these defined in the routine re_wctype_parse on line 1510(ish) of regex-emacs.c. So adding to these natively would require modifying the .c file and rebuilding.
I do not believe you can do this. But there is something similar that was recently released in the ample-regexp package found HERE. This was taken from the readme as an example:
(define-arx h-w-rx
'((h "Hello, ")
(w "world"))) ;; -> hello-world-rx
(h-w-rx h w) ;; -> "Hello, world"
(h-w-rx (* h w)) ;; -> "\\(?:Hello, world\\)*"
You could use this to define a wide range of aliases in one big define-arx.

emacs: highlighting balanced expressions (eg. LaTeX tags)

What would be a good way to get Emacs to highlight an expression that may include things like balanced brackets -- e.g. something like
\highlightthis{some \textit{text} here
some more text
done now}
highlight-regex works nicely for simple things, but I had real trouble writing an emacs regex to recognize line breaks, and of course it matches till the first closing bracket.
(as a secondary question: pointers to any packages that extend emacs regex syntax would be much appreciated -- I am having pretty hard time with it, and I'm fairly familiar with regexes in perl.)
Edit: For my specific purpose (LaTeX tags highlighting in an AUCTeX buffer), I was able to get this to work by customizing an AUCTeX specific variable font-latex-user-keyword-classes, that adds something like this to custom-set-variables in .emacs:
'(font-latex-user-keyword-classes (quote (("mycommands" (("highlightthis" "{")) (:slant italic :foreground "red") command))))
A more generic solution would still be nice to have though!
You could use functions acting on s-expressions to work with the region you want to highlight, and use one of the solutions mentionned on this question to actually highlight it.
Here is an example :
(defun my/highlight-function ()
(interactive)
(save-excursion
(goto-char (point-min))
(search-forward "\highlightthis")
(let ((end (scan-sexps (point) 1)))
(add-text-properties (point) end '(comment t face highlight)))))
EDIT : Here is an example using a similar function with Emacs' standard font locking system, as explained in the search-based fontification section of the emacs-lisp manual :
(defun my/highlight-function (bound)
(if (search-forward "\highlightthis" bound 'noerror)
(let ((begin (match-end 0))
(end (scan-sexps (point) 1)))
(set-match-data (list begin end))
t)
nil))
(add-hook 'LaTeX-mode-hook
(lambda ()
(font-lock-add-keywords nil '(my/highlight-function))))

problem with using replace-regexp in lisp

in my file, I have many instances of ID="XXX", and I want to replace the first one with ID="0", the second one with ID="1", and so on.
When I use regexp-replace interactively, I use ID="[^"]*" as the search string, and ID="\#" as the replacement string, and all is well.
now I want to bind this to a key, so I tried to do it in lisp, like so:
(replace-regexp "ID=\"[^\"]*\"" "ID=\"\\#\"")
but when I try to evaluate it, I get a 'selecting deleted buffer' error. It's probably something to do with escape characters, but I can't figure it out.
Unfortunately, the \# construct is only available in the interactive call to replace-regexp. From the documentation:
In interactive calls, the replacement text may contain `\,'
followed by a Lisp expression used as part of the replacement
text. Inside of that expression, `\&' is a string denoting the
whole match, `\N' a partial match, `\#&' and `\#N' the respective
numeric values from `string-to-number', and `\#' itself for
`replace-count', the number of replacements occurred so far.
And at the end of the documentation you'll see this hint:
This function is usually the wrong thing to use in a Lisp program.
What you probably want is a loop like this:
(while (re-search-forward REGEXP nil t)
(replace-match TO-STRING nil nil))
which will run faster and will not set the mark or print anything.
Which then leads us to this bit of elisp:
(save-excursion
(goto-char (point-min))
(let ((count 0))
(while (re-search-forward "ID=\"[^\"]*\"" nil t)
(replace-match (format "ID=\"%s\"" (setq count (1+ count)))))))
You could also use a keyboard macro, but I prefer the lisp solution.

in emacs-lisp, how do I correctly use replace-regexp-in-string?

Given a string, I want to replace all links within it with the link's description. For example, given
this is a [[http://link][description]]
I would like to return
this is a description
I used re-builder to construct this regexp for a link:
\\[\\[[^\\[]+\\]\\[[^\\[]+\\]\\]
This is my function:
(defun flatten-string-with-links (string)
(replace-regexp-in-string "\\[\\[[^\\[]+\\]\\[[^\\[]+\\]\\]"
(lambda(s) (nth 2 (split-string s "[\]\[]+"))) string))
Instead of replacing the entire regexp sequence, it only replaces the trailing "]]". This is what it produces:
this is a [[http://link][descriptiondescription
I don't understand what's going wrong. Any help would be much appreciated.
UPDATE: I've improved the regex for the link. It's irrelevant to the question but if someone's gonna copy it they may as well get the better version.
Your problem is that split-string is clobbering the match data, which
replace-regexp-in-string is relying on being unchanged, since it is going to
go use that match data to decide which sections of the string to cut out. This
is arguably a doc bug in that replace-regexp-in-string does not mention that
your replacement function must preserve the match data.
You can work around by using save-match-data, which is a macro provided for
exactly this purpose:
(defun flatten-string-with-links (string)
(replace-regexp-in-string "\\[\\[[a-zA-Z:%#/\.]+\\]\\[[a-zA-Z:%#/\.]+\\]\\]"
(lambda (s) (save-match-data
(nth 2 (split-string s "[\]\[]+")))) string))