Emacs query-replace-regexp multiline - regex

How do you do a query-replace-regexp in Emacs that will match across multiple lines?
as a trivial example I'd want <p>\(.*?\)</p> to match
<p>foo
bar
</p>

M-x re-builder
is your friend. And it led me to this regular expression:
"<p>\\(.\\|\n\\)*</p>"
which is the string version of
<p>\(.\|^J\)*</p> ;# where you enter ^J by C-q C-j
And that works for me when I do re-search-forward, but not when I do 'query-replace-regexp. Unsure why...
Now, when doing a 're-search-forward (aka C-u C-s), you can type M-% which will prompt you for a replacement (as of Emacs 22). So, you can use that to do your search and replace with the above regexp.
Note, the above regexp will match until the last </p> found in the buffer, which is probably not what you want, so use re-builder to build a regexp that comes closer to what you want. Obviously regular expressions can't count parenthesis, so you're on your own for that - depends on how robust a solution you want.

Try character classes. As long as you're using only ASCII character set, you can use [[:ascii:]] instead of the dot. Using the longer [[:ascii:][:nonascii:]] ought to work for everything.

Related

Basic Vim - Search and Replace text bounded by specific characters

Say I wanted to replace :
"Christoph Waltz" = "That's a Bingo";
"Gary Coleman" = "What are you talking about, dear Willis?";
to just have :
"Christoph Waltz"
"Gary Coleman"
i.e. I want to remove all the characters including and after the = and the ;
I thought the regex for finding the pattern would be \=.*?\;. In vim, I tried :
:%s/\=.*?\;$//g
but it gave me an Invalid Command error and Nothing after \=. How do I remove the above text? Apologies, but I'm new to this.
Vim's regular expression dialect is different; its escaping is optimized for text searches. See :help perl-patterns for a comparison with Perl regular expressions. As #EvergreenTree has noted, you can influence the escaping behavior with special atoms; cp. :help /\v.
For your example, the non-greedy match is .\{-}, not .*?, and, as mentioned, you mustn't escape several literal characters:
:%s/ =.\{-};$//
(The /g flag is superfluous, too; there can be only one match anchored to the end via $.)
This is because of vim's weird handling of regexes by default. Instead of \= interpreting as a literal =, it interprets it as a special regex character. To make vim's regex system work more normally, you can prefix it with \v, which is "very magic" mode. This should do the trick:
%s/\v\=.*\;$//g
I won't explain how vim enterprets every single character in very magic mode here, but you can read about it in this help topic:
:help /magic

Vim regex to substitute/escape pipe characters

Let's suppose I have a line:
a|b|c
I'd like to run a regex to convert it to:
a\|b\|c
In most regex engines I'm familiar with, something like s%\|%\\|%g should work. If I try this in Vim, I get:
\|a\||\|b\||\|c
As it turns out, I discovered the answer while typing up this question. I'll submit it with my solution, anyway, as I was a bit surprised a search didn't turn up any duplicates.
vim has its own regex syntax. There is a comparison with PCRE in vim help doc (see :help perl-patterns).
except for that, vim has no magic/magic/very magic mode. :h magic to check the table.
by default, vim has magic mode. if you want to make the :s command in your question work, just active the very magic:
:s/\v\|/\\|/g
Vim does the opposite of PCRE in this regard: | is a literal pipe character, with \| serving as the alternation operator. I couldn't find an appropriate escape sequence because the pipe character does not need to be escaped.
The following command works for the line in my example:
:. s%|%\\|%g
If you use very-magic (use \v) you'll have the Perl/pcre behaviour on most special characters (excl. the vim specifics):
:s#\v\|#\\|#g

Perform a non-regex search/replace in vim

When doing search/replace in vim, I almost never need to use regex, so it's a pain to constantly be escaping everything, Is there a way to make it default to not using regex or is there an alternative command to accomplish this?
As an example, if I want to replace < with <, I'd like to just be able to type s/</</g instead of s/\</\&lt\;/g
For the :s command there is a shortcut to disable or force magic. To turn off magic use :sno like:
:sno/search_string/replace_string/g
Found here: http://vim.wikia.com/wiki/Simplifying_regular_expressions_using_magic_and_no-magic
Use this option:
set nomagic
See :help /magic
The problem is primarily caused by confusion about the role of the & in the replacement string. The replacement string is not a reg-ex, although it has some special characters, like &. You can read about role of & in replacement string here: :h sub-replace-special .
I suspect the main problem for OP is not necessarily typing the extra backslashes, but rather remembering when a backslash is needed and when not. One workaround may be to start making use of "replacement expressions" when unsure. ( See :h sub-replace-expression.) This requires putting a `\=' in replacement string but for some people it may give you more natural control over what's being substituted, since putting a string literal in single quotes will give you the replacement string you want. For example, this substitute does what OP wants:
:s/</\='<'/g
If you want to search literally, you can use the \V regex atom. This almost does what you want, except that you also need to escape the backslash. You could define your own search command, that would search literally. Something like this:
:com! -nargs=1 Search :let #/='\V'.escape(<q-args>, '\/')| normal! n
And then use :Search /foobar/baz
For Substitute, you could then after a :Search command simply use
:%s//replace/g
since then Vim would implicitly pick up the last search item and use the for replacing.
(Just want to give you some ideas)
Here’s how to disable regular expression search/replace only in command mode:
autocmd CmdWinEnter * set nomagic
autocmd CmdWinLeave * set magic
All plugins that depends on regular expression such as white-space remover should works as usual.
Have you enabled magic?
:set magic
Try the Edit Find and replace on the menu bar.

Syntax of emacs replace-regexp

I'm trying to find every date and put quotes around it. I thought this should be:
M-x replace-regexp 201\d-\d\d-\d\d <ret> '\&'
I also tried [0-9] instead of \d.
it doesn't work. But using isearch-forward-regexp I can type [0-9][0-9] and watch the targets highlight. What am I doing wrong with the replace?
Emacs regexps don't have the common \d shorthand for [0-9].
I just put the text 2011-04-01 into a new buffer, went back to the start of the buffer, and typed M-x replace-regexp RET 201[0-9]-[0-9][0-9]-[0-9][0-9] RET '\&' RET, and the date was surrounded by single quotes, as expected.
\d is not supported in Emacs regular expression syntax (C-h S regexp has documentation on them). You can use [0-9] as you did, or use the POSIX style [[:digit:]].
However, what you did (with [0-9]) should have worked, and did in fact just work for me. If you're using a regular expression in a program you might also find M-x regexp-builder useful.
You may need to escape the -s:
M-x replace-regexp <RET> 201[0-9]\-[0-9][0-9]\-[0-9][0-9] <RET> '\&'
The regexps given are overly inclusive. The following will also match invalid dates, like "2014-01-33", but not as many (assumes YYYY-MM-DD ordering):
"^20[0-9][0-9]-[0-1][0-2]-[0-3][0-9]"
or, to avoid dates like "2014-11-33" (but not like "2014-02-31" or "2014-00-00"), you could be slightly more restrictive:
"^20[0-9][0-9]-[0-1][0-2]-\([0-2][0-9]\|30\|31\)"

regex implementation to replace group with its lowercase version

Is there any implementation of regex that allow to replace group in regex with lowercase version of it?
If your regex version supports it, you can use \L, like so in a POSIX shell:
sed -r 's/(^.*)/\L\1/'
In Perl, you can do:
$string =~ s/(some_regex)/lc($1)/ge;
The /e option causes the replacement expression to be interpreted as Perl code to be evaluated, whose return value is used as the final replacement value. lc($x) returns the lowercased version of $x. (Not sure but I assume lc() will handle international characters correctly in recent Perl versions.)
/g means match globally. Omit the g if you only want a single replacement.
If you're using an editor like SublimeText or TextMate1, there's a good chance you may use
\L$1
as your replacement, where $1 refers to something from the regular expression that you put parentheses around. For example2, here's something I used to downcase field names in some SQL, getting everything to the right of the 'as' at the end of any given line. First the "find" regular expression:
(as|AS) ([A-Za-z_]+)\s*,$
and then the replacement expression:
$1 '\L$2',
If you use Vim (or presumably gvim), then you'll want to use \L\1 instead of \L$1, but there's another wrinkle that you'll need to be aware of: Vim reverses the syntax between literal parenthesis characters and escaped parenthesis characters. So to designate a part of the regular expression to be included in the replacement ("captured"), you'll use \( at the beginning and \) at the end. Think of \ as—instead of escaping a special character to make it a literal—marking the beginning of a special character (as with \s, \w, \b and so forth). So it may seem odd if you're not used to it, but it is actually perfectly logical if you think of it in the Vim way.
1 I've tested this in both TextMate and SublimeText and it works as-is, but some editors use \1 instead of $1. Try both and see which your editor uses.
2 I just pulled this regex out of my history. I always tweak regexen while using them, and I can't promise this the final version, so I'm not suggesting it's fit for the purpose described, and especially not with SQL formatted differently from the SQL I was working on, just that it's a specific example of downcasing in regular expressions. YMMV. UAYOR.
Several answers have noted the use of \L. However, \E is also worth knowing about if you use \L.
\L converts everything up to the next \U or \E to lowercase. ... \E turns off case conversion.
(Source: https://www.regular-expressions.info/replacecase.html )
So, suppose you wanted to use rename to lowercase part of some file names like this:
artist_-_album_-_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
artist_-_album_-_Another_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
you could do something like:
rename -v 's/^(.*_-_)(.*)(_-_.*.m4a)/$1\L$2\E$3/g' *
In Perl, there's
$string =~ tr/[A-Z]/[a-z]/;
Most Regex implementations allow you to pass a callback function when doing a replace, hence you can simply return a lowercase version of the match from the callback.