Syntax of emacs replace-regexp - regex

I'm trying to find every date and put quotes around it. I thought this should be:
M-x replace-regexp 201\d-\d\d-\d\d <ret> '\&'
I also tried [0-9] instead of \d.
it doesn't work. But using isearch-forward-regexp I can type [0-9][0-9] and watch the targets highlight. What am I doing wrong with the replace?

Emacs regexps don't have the common \d shorthand for [0-9].
I just put the text 2011-04-01 into a new buffer, went back to the start of the buffer, and typed M-x replace-regexp RET 201[0-9]-[0-9][0-9]-[0-9][0-9] RET '\&' RET, and the date was surrounded by single quotes, as expected.

\d is not supported in Emacs regular expression syntax (C-h S regexp has documentation on them). You can use [0-9] as you did, or use the POSIX style [[:digit:]].
However, what you did (with [0-9]) should have worked, and did in fact just work for me. If you're using a regular expression in a program you might also find M-x regexp-builder useful.

You may need to escape the -s:
M-x replace-regexp <RET> 201[0-9]\-[0-9][0-9]\-[0-9][0-9] <RET> '\&'

The regexps given are overly inclusive. The following will also match invalid dates, like "2014-01-33", but not as many (assumes YYYY-MM-DD ordering):
"^20[0-9][0-9]-[0-1][0-2]-[0-3][0-9]"
or, to avoid dates like "2014-11-33" (but not like "2014-02-31" or "2014-00-00"), you could be slightly more restrictive:
"^20[0-9][0-9]-[0-1][0-2]-\([0-2][0-9]\|30\|31\)"

Related

Basic Vim - Search and Replace text bounded by specific characters

Say I wanted to replace :
"Christoph Waltz" = "That's a Bingo";
"Gary Coleman" = "What are you talking about, dear Willis?";
to just have :
"Christoph Waltz"
"Gary Coleman"
i.e. I want to remove all the characters including and after the = and the ;
I thought the regex for finding the pattern would be \=.*?\;. In vim, I tried :
:%s/\=.*?\;$//g
but it gave me an Invalid Command error and Nothing after \=. How do I remove the above text? Apologies, but I'm new to this.
Vim's regular expression dialect is different; its escaping is optimized for text searches. See :help perl-patterns for a comparison with Perl regular expressions. As #EvergreenTree has noted, you can influence the escaping behavior with special atoms; cp. :help /\v.
For your example, the non-greedy match is .\{-}, not .*?, and, as mentioned, you mustn't escape several literal characters:
:%s/ =.\{-};$//
(The /g flag is superfluous, too; there can be only one match anchored to the end via $.)
This is because of vim's weird handling of regexes by default. Instead of \= interpreting as a literal =, it interprets it as a special regex character. To make vim's regex system work more normally, you can prefix it with \v, which is "very magic" mode. This should do the trick:
%s/\v\=.*\;$//g
I won't explain how vim enterprets every single character in very magic mode here, but you can read about it in this help topic:
:help /magic

Vim regex to substitute/escape pipe characters

Let's suppose I have a line:
a|b|c
I'd like to run a regex to convert it to:
a\|b\|c
In most regex engines I'm familiar with, something like s%\|%\\|%g should work. If I try this in Vim, I get:
\|a\||\|b\||\|c
As it turns out, I discovered the answer while typing up this question. I'll submit it with my solution, anyway, as I was a bit surprised a search didn't turn up any duplicates.
vim has its own regex syntax. There is a comparison with PCRE in vim help doc (see :help perl-patterns).
except for that, vim has no magic/magic/very magic mode. :h magic to check the table.
by default, vim has magic mode. if you want to make the :s command in your question work, just active the very magic:
:s/\v\|/\\|/g
Vim does the opposite of PCRE in this regard: | is a literal pipe character, with \| serving as the alternation operator. I couldn't find an appropriate escape sequence because the pipe character does not need to be escaped.
The following command works for the line in my example:
:. s%|%\\|%g
If you use very-magic (use \v) you'll have the Perl/pcre behaviour on most special characters (excl. the vim specifics):
:s#\v\|#\\|#g

Emacs (TeX): how to search and replace a whole region?

I bother you to have some tips for this problem: I'm working in Latex with a very dirty code, generated by writer2latex (quite good programme, anyway) and, using Emacs, I'm trying to query-replace multiple lines of code, for instance:
{\centering [Warning: Image ignored] % Unhandled or unsupported graphics:
%\includegraphics[width=11.104cm,height=8.23cm]{img34}
have to become:
\begin{figure}[tpb]
\begin{center}
\includegraphics[width=\textwidth]{img34}
Using M-x re-builder, I found out that I could underline the whole region I need to query-replace with the string: \{.*centering.*c-qc-j.*cm] but, if I M-x replace-regexp using this, I only get: Invalid regexp: "Invalid content of \\{\\}"
Any suggestion about how to perform the query? I have a HUGE amount of lines like these to replace... :-)
You're getting this error message because in Emacs' regular expressions the curly braces\{ and \} have special meaning. These braces are used to specify that the part of the regexp immediately before the braces should be matched a certain number of times.
From the GNU Emacs documentation on regexps:
\{n\}
is a postfix operator specifying n repetitions [...]
\{n,m\}
is a postfix operator specifying between n and m repetitions [...]
If you want your regexp to actually match a curly brace, do not escape it with a leading slash:
{.*centering.*C-q C-j.*cm]
In order to use a backslash in the replacement string you have to escape it with another backslash. (When doing this in code, it quickly becomes quite ugly because inside a double-quoted string backslashes themselves have to be escaped already. However, since you are doing your replacements interactively, the double escaping is not necessary and thus two backslashs are enough.)
M-C-% {.*centering.*C-q C-j.*cm] RET \\begin{figure}[tpb]C-q C-j\\begin{center}C-q C-j\\includegraphics[width=\\textwidth] RET
Make sure the re-syntax is "read", C-c tab. Remove the initial backslash. Now the regexp should work if you yank it into replace-regexp

Emacs query-replace-regexp multiline

How do you do a query-replace-regexp in Emacs that will match across multiple lines?
as a trivial example I'd want <p>\(.*?\)</p> to match
<p>foo
bar
</p>
M-x re-builder
is your friend. And it led me to this regular expression:
"<p>\\(.\\|\n\\)*</p>"
which is the string version of
<p>\(.\|^J\)*</p> ;# where you enter ^J by C-q C-j
And that works for me when I do re-search-forward, but not when I do 'query-replace-regexp. Unsure why...
Now, when doing a 're-search-forward (aka C-u C-s), you can type M-% which will prompt you for a replacement (as of Emacs 22). So, you can use that to do your search and replace with the above regexp.
Note, the above regexp will match until the last </p> found in the buffer, which is probably not what you want, so use re-builder to build a regexp that comes closer to what you want. Obviously regular expressions can't count parenthesis, so you're on your own for that - depends on how robust a solution you want.
Try character classes. As long as you're using only ASCII character set, you can use [[:ascii:]] instead of the dot. Using the longer [[:ascii:][:nonascii:]] ought to work for everything.

regex implementation to replace group with its lowercase version

Is there any implementation of regex that allow to replace group in regex with lowercase version of it?
If your regex version supports it, you can use \L, like so in a POSIX shell:
sed -r 's/(^.*)/\L\1/'
In Perl, you can do:
$string =~ s/(some_regex)/lc($1)/ge;
The /e option causes the replacement expression to be interpreted as Perl code to be evaluated, whose return value is used as the final replacement value. lc($x) returns the lowercased version of $x. (Not sure but I assume lc() will handle international characters correctly in recent Perl versions.)
/g means match globally. Omit the g if you only want a single replacement.
If you're using an editor like SublimeText or TextMate1, there's a good chance you may use
\L$1
as your replacement, where $1 refers to something from the regular expression that you put parentheses around. For example2, here's something I used to downcase field names in some SQL, getting everything to the right of the 'as' at the end of any given line. First the "find" regular expression:
(as|AS) ([A-Za-z_]+)\s*,$
and then the replacement expression:
$1 '\L$2',
If you use Vim (or presumably gvim), then you'll want to use \L\1 instead of \L$1, but there's another wrinkle that you'll need to be aware of: Vim reverses the syntax between literal parenthesis characters and escaped parenthesis characters. So to designate a part of the regular expression to be included in the replacement ("captured"), you'll use \( at the beginning and \) at the end. Think of \ as—instead of escaping a special character to make it a literal—marking the beginning of a special character (as with \s, \w, \b and so forth). So it may seem odd if you're not used to it, but it is actually perfectly logical if you think of it in the Vim way.
1 I've tested this in both TextMate and SublimeText and it works as-is, but some editors use \1 instead of $1. Try both and see which your editor uses.
2 I just pulled this regex out of my history. I always tweak regexen while using them, and I can't promise this the final version, so I'm not suggesting it's fit for the purpose described, and especially not with SQL formatted differently from the SQL I was working on, just that it's a specific example of downcasing in regular expressions. YMMV. UAYOR.
Several answers have noted the use of \L. However, \E is also worth knowing about if you use \L.
\L converts everything up to the next \U or \E to lowercase. ... \E turns off case conversion.
(Source: https://www.regular-expressions.info/replacecase.html )
So, suppose you wanted to use rename to lowercase part of some file names like this:
artist_-_album_-_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
artist_-_album_-_Another_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
you could do something like:
rename -v 's/^(.*_-_)(.*)(_-_.*.m4a)/$1\L$2\E$3/g' *
In Perl, there's
$string =~ tr/[A-Z]/[a-z]/;
Most Regex implementations allow you to pass a callback function when doing a replace, hence you can simply return a lowercase version of the match from the callback.