How many backslashes are required to escape regexps in emacs' "Customize" mode? - regex

I'm trying to use emacs' customize-group packages to tweak some parts of my setup, and I'm stymied. I see things like this in my .emacs file after I make changes with customize:
'(tramp-backup-directory-alist (quote (("\\\\`.*\\\\'" . "~/.emacs.d/autobackups"))))
This was the result of putting the following into the customize text field:
Regexp matching filename: \\`.*\\'
This is a representative sample: I'm actually trying to change several things that want a regexp, and they all show this same problem. How many layers of quoting are there, really? I can't seem to find the magic number of backslashes to get the gosh-dang thing to do what I'm asking it to, even for the simplest regular expressions like .*. Right now, the given customization produces - nothing. It makes no change from emacs' default behavior.
Better yet, where on earth is this documented? It's a little difficult to Google for, but I've been trying quite a few things there as well as in the official documentation and the Emacs wiki. Where is an authoritative source for how many dang backslashes one needs to make a regular expression in customize-mode actually work - or at the very least, to fail with some kind of warning instead of failing silently?
EDIT: As so often happens with questions asked in anger, I was asking the wrong question. Fortunately the answers below, led me to the answer to the question that I needed, which was about quoting rules. I'm going to try to write down what I learned here, because I find the documentation and Googleable resources to be maddeningly obscure about this. So here are the quoting rules I found by trial and error, and I hope that they help someone else, inspire correction, or both.
When an emacs customize-mode buffer asks you for a "Regexp matching filename", it is being, as emacs often is, both terse and idiosyncratic (how often the creator's personality is imparted to the creation!). It means, for one thing, a regexp that will be compared to the whole path of the file in search of a match, not just to the name of the file itself as you might assume from the term "filename". This is the same sense of "filename" used in emacs' buffer-file-name function, for example.
Further, although if you put foo in the field, you'll see "foo" (with double-quotes) written to the actual file, that's not enough quoting and not the right quoting. You need to quote your regexp with the quoting style that, as far as I can tell, only emacs uses: the ``backtick-foo-single-quote'`scheme. And then you need to escape that, making it \`backslash-backtick-foo-backslash-single-quote\' (and if you think that's a headache to input in Markdown, it's more so in emacs).
On top of this, emacs appears to have a rule that the . regexp special character does not match a / at the beginning of filenames, so, as was happening to me above, the classic .* pattern will appear to match nothing: to match "all files", you actually need the regexp /.*, which then you stuff into the quote format of customize-mode to produce \`/.*\', after which customize paints another layer of escaping onto it and writes it to the customization file.
The final result for one of my efforts - a setting such that #autosave# files don't gunk up the directory you're working in, but instead all live in one place:
(custom-set variables
'(auto-save-file-name-transforms (quote (
("\\`/[^/]*:\\([^/]*/\\)*\\([^/]*\\)\\'" "~/.emacs.d/autobackups/\\2" t)
("\\`/.*/\\(.*?\\)\\'" "~/.emacs.d/autobackups/\\1" t)
))))
Backslashes in elisp are a far greater threat to your sanity than parentheses.
EDIT 2: Time for me to be wrong again. I finally found the relevant documentation (through reading another Stack Overflow question, of course!): Regexp Backslash Constructs. The crucial point of confusion for me: the backtick and single quote are not quoting in this context: they're the equivalent of perl's ^ and $ special characters. The backslash-backtick construct matches an empty string anchored at the beginning of the string being checked for a match, and the backslash-single-quote construct matches the empty string at the end of the string-under-consideration. And by "string under consideration," I mean "buffer, which just happens to contain only a file path in this case, but you need to match the whole dang thing if you want a match at all, since this is elisp's global regexp behavior."
Swear to god, it's like dealing with an alien civilization.
EDIT 3: In order to avoid confusing future readers -
\` is the emacs regex for "the beginning of the buffer." (cf Perl's \A)
\' is the emacs regex for "the end of the buffer." (cf Perl's \Z)
^ is the common-idiom regex for "the beginning of the line." It can be used in emacs.
$ is the common-idiom regex for "the end of the line." It can be used in emacs.
Because regex searches across multi-line bodies of text are more common in emacs than elsewhere (e.g. M-x occur), the backtick and single-quote special characters are used in emacs, and as best as I can tell, they're used in the context of customize-mode because if you are considering generic unknown input to a customize-mode field, it could contain newlines, and therefore you want to use the beginning-of-buffer and end-of-buffer special characters because the beginning and end of the input are not guaranteed to be the beginning and end of a line.
I am not sure whether to regret hijacking my own Stack Overflow question and essentially turning it into a blog post.

In the customize field, you'd enter the regexp according to the syntax described here. When customize writes the regexp into a string, any backslashes or double-quote chars in the regexp will be escaped, as per regular string escaping conventions.
So in short, just enter single backslashes in the regexp field, and they'll get correctly doubled up in the resulting custom-set-variables clause written to your .emacs.
Also: since your regexp is for matching filenames, you might try opening up a directory containing files you'd like to match, and then run M-x re-builder RET. You can then enter the regexp in string-escaped format to confirm that it matches those files. By typing % m in a dired buffer, you can enter a regexp in unescaped format (ie. just like in the customize field), and dired will mark matching filenames.

Related

Vim: how to change phrase (of varied type and length) surroundings

I have a Rmarkdown file and I want to translate it to a media wiki file, because they are different markup languages, they have different syntax, and I want to find a way to change them all with the substitute command.
The two use cases are:
Change headings # title, ## subtitle and the like to =title= and ==subtitle==,
Change surroundings $inline formula$, $$standalone formula $$ and the like to <math>inline formula</math> and :<math>standalone formula</math>.
The two differ in that the first one I want to match the beginning of a phrase, then surround it with some other symbols, and the second one is changing one kind of surrounding to another. I don't want to record some macros because the phrase can be very different, and there are many of them. I tried some commands like %s/=*/*/gc or %s/$*$/<math>*</math>/gc but they didn't work, so I think I might have messed up the regular expressions (my knowledge about regex is quite limited and I'm not sure which of the symbols should be escaped in Vim).
In every regular expression dialect, * is a greedy quantifier that means "0 or more of the preceding atom, as many as possible", whereas it means "any character" when used as a glob (in your shell, for example).
Since substitutions in Vim use regular expressions and not globs, it should be evident why :%s/=*/*/gc and %s/$*$/<math>*</math>/gc don't do anything useful.
Another problem with your attempts is that you somehow expect the regular expression engine to guess what part of the match to reuse in the replacement. This is done by reusing "sub-expressions", sometimes called "capture groups".
See :help \(, :help \1, and http://vimregex.com/.
Change headings #title, ##subtitle and the like to =title= and ==subtitle==,
:%s/^\(#\+\)\(.*\)/\=repeat('=',len(submatch(1))).submatch(2).repeat('=',len(submatch(1)))
Note that your markdown headings are malformed.

How to search and replace from the last match of a until b?

I have a latex file in which I want to get rid of the last \\ before a \end{quoting}.
The section of the file I'm working on looks similar to this:
\myverse{some text \\
some more text \\}%
%
\myverse{again some text \\
this is my last line \\}%
\footnote{possibly some footnotes here}%
%
\end{quoting}
over several hundred lines, covering maybe 50 quoting environments.
I tried with :%s/\\\\}%\(\_.\{-}\)\\end{quoting}/}%\1\\end{quoting}/gc but unfortunately the non-greedy quantifier \{-} is still too greedy.
It catches starting from the second line of my example until the end of the quoting environment, I guess the greedy quantifier would catch up to the last \end{quoting} in the file. Is there any possibility of doing this with search and replace, or should I write a macro for this?
EDIT: my expected output would look something like this:
this is my last line }%
\footnote{possibly some footnotes here}%
%
\end{quoting}
(I should add that I've by now solved the task by writing a small macro, still I'm curious if it could also be done by search and replace.)
I think you're trying to match from the last occurrence of \\}% prior to end{quoting}, up to the end{quoting}, in which case you don't really want any character (\_.), you want "any character that isn't \\}%" (yes I know that's not a single character, but that's basically it).
So, simply (ha!) change your pattern to use \%(\%(\\\\}%\)\#!\_.\)\{-} instead of \_.\{-}; this means that the pattern cannot contain multiple \\}% sequences, thus achieving your aims (as far as I can determine them).
This uses a negative zero-width look-ahead pattern \#! to ensure that the next match for any character, is limited to not match the specific text we want to avoid (but other than that, anything else still matches). See :help /zero-width for more of these.
I.e. your final command would be:
:%s/\\\\}%\(\%(\%(\\\\}%\)\#!\_.\)\{-}\)\\end{quoting}/}%\1\\end{quoting}/g
(I note your "expected" output does not contain the first few lines for some reason, were they just omitted or was the command supposed to remove them?)
You’re on the right track using the non-greedy multi. The Vim help files
state that,
"{-}" is the same as "*" but uses the shortest match first algorithm.
However, the very next line warns of the issue that you have encountered.
BUT: A match that starts earlier is preferred over a shorter match: "a{-}b" matches "aaab" in "xaaab".
To the best of my knowledge, your best solution would be to use the macro.

Vim S&R to remove number from end of InstallShield file

I've got a practical application for a vim regex where I'd like to remove numbers from the end of file location links. For example, if the developer is sloppy and just adds files and doesn't reuse file locations, you'll end up with something awful like this:
PATH_TO_MY_FILES&gt
PATH_TO_MY_FILES1&gt
...
PATH_TO_MY_FILES22&gt
PATH_TO_MY_FILES_ELSEWHERE&gt
PATH_TO_MY_FILES_ELSEWHERE1&gt
...
So all I want to do is to S&R and replace PATH_TO_MY_FILES*\d+ with PATH_TO_MY_FILES* using regex. Obviously I am not doing it quite right, so I was hoping someone here could not spoon feed the answer necessarily, but throw a regex buzzword my way to get me on track.
Here's what I have tried:
:%s\(PATH_TO_MY_FILES\w*\)\(\d+\)&gt:gc
But this doesn't work, i.e. if I just do a vim search on that, it doesn't find anything. However, if I use this:
:%s\(PATH_TO_MY_FILES\w*\)\(\d\)&gt:gc
It will match the string, but the grouping is off, as expected. For example, the string PATH_TO_MY_FILES22 will be grouped as (PATH_TO_MY_FILES2)(2), presumably because the \d only matches the 2, and the \w match includes the first 2.
Question 1: Why doesn't \d+ work?
If I go ahead and use the second string (which is wrong), Vim appears to find a match (even though the grouping is wrong), but then does the replacement incorrectly.
For example, given that we know the \d will only match the last number in the string, I would expect PATH_TO_MY_FILES22&gt to get replaced with PATH_TO_MY_FILES2&gt. However, instead it replaces it with this:
PATH_TO_MY_FILES2PATH_TO_MY_FILES22&gtgt
So basically, it looks like it finds PATH_TO_MY_FILES22&gt, but then replaces only the & with group 1, which is PATH_TO_MY_FILES2.
I tried another regex at Regexr.com to see how it would interpret my grouping, and it looked correct, but maybe a hack around my lack of regex understanding:
(PATH_TO_\D*)(\d*)&gt
This correctly broke my target string into the PATH part and the entire number, so I was happy. But then when I used this in Vim, it found the match, but still replaced only the &.
Question 2: Why is Vim only replacing the &?
Answer 1:
You need to escape the + or it will be taken literally. For example \d\+ works correctly.
Answer 2:
An unescaped & in the replacement portion of a substitution means "the entire matched text". You need to escape it if you want a literal ampersand.

EditPad: How to replace multiple search criteria with multiple values?

I did some searching and found tons of questions about multiple replacements with Regex, but I'm working in EditPadPro and so need a solution that works with the regex syntax of that environment. Hoping someone has some pointers as I haven't been able to work out the solution on my own.
Additional disclaimer: I suck with regex. I mean really... it's bad. Like I barely know wtf I'm doing.So that being said, here is what I need to do and how I'm currently approaching it...
I need to replace two possible values, with their corresponding replacements. My two searches are:
(.*)-sm
(.*)-rad
Currently I run these separately and replace each with simple strings:
sm
rad
Basically I need to lop off anything that comes prior to "sm" so I just detect everything up to and including sm, and then replace it all with that string (and likewise for "rad").
But it seems like there should be a way to do this in a single search/replace operation. I can do the search part fine with:
(.*)-sm|(.*)-rad
But then how to replace each with it's matching value? That's where I'm stuck. I tried:
sm|rad
but alas, that just becomes the literal complete string that is used for replacement.
Jonathan, first off let me congratulate you for using EPP Pro for regex in your text. It's my main text editor, and the main reason I chose it, as a regex lover, is that its support of regex syntax is vastly superior to competing editors. For instance Notepad++ is known for its shoddy support of regular expressions. The reason of course is that EPP's author Jan Goyvaerts is the author of the legendary RegexBuddy.
A picture is worth a thousand words... So here is how I would do your replacement. Just hit the "replace all button". The expression in the regex box assumes that anything before the dash that is not a whitespace character can be stripped, so if this is not what you want, we need to tune it.
Search for:
(.*)-(sm|rad)
Now, when you put something in parenthesis in Regex, those matches are stored in temporary variables. So whatever matched (.*) is stored in \1 and whatever matched (sm|rad) is stored in \2. Therefore, you want to replace with:
\2
Note that the replacement variable may be different depending on what programming language you are using. In Perl, for example, I would have to use $2 instead.

Can some please explain this elisp regexp

Can some please explain the following regexp, which I found in ediff-trees.el as a specification for which files/directories to exclude from its comparison process.
"\\`\\(\\.?#.*\\|.*,v\\|.*~\\|\\.svn\\|CVS\\|_darcs\\)\\'"
Although I am somewhat familiar with regular expressions encountering this elisp string-based variant has thrown me off.
First thing, remember that elisp's regexes have to be string-escaped, which created a lot of extra backslashes. Removing them, we get
\`\(\.?#.*\|.*,v\|.*~\|\.svn\|CVS\|_darcs\)\'
Then, \( and \) mean grouping, "foo\|bar" means "either foo, or bar".
So, piece by piece, this regexp matches: either an emacs temporary file (something starting with #, possibly preceded by a period: .?#.), or an RCS file (ending in ,v: .,v), or an emacs backup file (ending in ~: .*~), or an svn directory (.svn), or a cvs directory (CVS), or a darcs directory (_darcs).
Edit to correct: as andre-r correctly points out, the backtick \` and single quote \' basically mean "beginning and end of the string" (respectively). So this means that the regexp finds strings which match exactly one of the choices I've outlined above (i.e., the string starts, then comes one of those choices, then the string ends). I had previously said they meant quoting, I don't know what I was thinking :). Thanks andre-r!
Sorry, this isn't really an answer; it's merely a comment to rbp's answer. But I can't figure out how to get the code sample to render nicely inside a comment, whereas it looks fine here in this answer.
Anyway:
I dunno about you, but I find
(rx bos (group (or (and (zero-or-one ".") "#" (zero-or-more nonl))
(and (zero-or-more nonl) ",v" )
(and (zero-or-more nonl) "~" )
".svn"
"CVS"
"_darcs"
))
eos)
a lot easier to read -- and it's exactly equivalent.
Parentheses in elisp regexes need to be escaped. Backslashes in strings need to be escaped, so you end up with \\( and \\) when any sensible regex parser would just use ( and ). Don't get me wrong, I love Emacs, but having to escape parentheses in a regex was a really bad idea. The pipes and periods and backticks are also being escaped - that's why you've got this hell of double backslashes. Strip out those and you get (in regex literal form):
`(.?#.*|.*,v|.*~|\.svn|CVS|_darcs)'
See this question for more discussion on the subject of escaped parens in elisp.