Basic Vim - Search and Replace text bounded by specific characters - regex

Say I wanted to replace :
"Christoph Waltz" = "That's a Bingo";
"Gary Coleman" = "What are you talking about, dear Willis?";
to just have :
"Christoph Waltz"
"Gary Coleman"
i.e. I want to remove all the characters including and after the = and the ;
I thought the regex for finding the pattern would be \=.*?\;. In vim, I tried :
:%s/\=.*?\;$//g
but it gave me an Invalid Command error and Nothing after \=. How do I remove the above text? Apologies, but I'm new to this.

Vim's regular expression dialect is different; its escaping is optimized for text searches. See :help perl-patterns for a comparison with Perl regular expressions. As #EvergreenTree has noted, you can influence the escaping behavior with special atoms; cp. :help /\v.
For your example, the non-greedy match is .\{-}, not .*?, and, as mentioned, you mustn't escape several literal characters:
:%s/ =.\{-};$//
(The /g flag is superfluous, too; there can be only one match anchored to the end via $.)

This is because of vim's weird handling of regexes by default. Instead of \= interpreting as a literal =, it interprets it as a special regex character. To make vim's regex system work more normally, you can prefix it with \v, which is "very magic" mode. This should do the trick:
%s/\v\=.*\;$//g
I won't explain how vim enterprets every single character in very magic mode here, but you can read about it in this help topic:
:help /magic

Related

Why would a regex work in Sublime and not in vim?

Tried searching for regex found in this answer:
(,)(?=(?:[^']|'[^']*')*$)
I tried doing a search in Sublime and it worked out (around 700 results). When trying to replace the results it runs out of memory. Tried /(,)(?=(?:[^']|'[^']*')*$) in vim for searching first but it does not find any instances of the pattern. Also tried escaping all the ( and ) with \ in the regex.
Vim uses its own regular expression engine and syntax (which predates PCRE, by the way) so porting a regex from perl or some other editor will most likely need some work.
The many differences are too numerous to list in detail here but :help pattern and :help perl-patterns will help.
Anyway, this quick and dirty rewrite of your regular expression seems to work on the sample given in the linked question:
/\v(,)(\#=([^']|'[^']*')*$)
See :help \#= and :help \v.
One possible explanation is that the regular expression engine used in Sublime is different than the engine used in vim.
Not all regex engines are created equal; they don't all support the same features. (For example, a "negative lookahead" feature can be very powerful, but not all engines support it. And the syntax for some features differs betwen engines.)
A brief comparison of regular expression engines is available here:
http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
Unfortunately Vim uses a different engine, and "normal" regular expressions won't work.
The regex you've mentioned isn't perfect: it doesn't skip escaped quotes, but, as I understand, it's good enough for you. Try this one, and if it doesn't match something, please send me that piece.
\v^([^']|'[^']*')*\zs,
A little explanation:
\v enables very magic search to avoid complex escaping rules
([^']|'[^']*') matches all symbols but quote and a pair of qoutes
\zs indicates the beginning of selection; you can think of it as of a replacement for lookbehind.
You have to escape the |, otherwise it doesn't work under vim. You should also escape the round brackets, unless you are searching for the '(' or ')' characters.
More information on regex usage in vim can be found on vimregex.com.

Vim regex to substitute/escape pipe characters

Let's suppose I have a line:
a|b|c
I'd like to run a regex to convert it to:
a\|b\|c
In most regex engines I'm familiar with, something like s%\|%\\|%g should work. If I try this in Vim, I get:
\|a\||\|b\||\|c
As it turns out, I discovered the answer while typing up this question. I'll submit it with my solution, anyway, as I was a bit surprised a search didn't turn up any duplicates.
vim has its own regex syntax. There is a comparison with PCRE in vim help doc (see :help perl-patterns).
except for that, vim has no magic/magic/very magic mode. :h magic to check the table.
by default, vim has magic mode. if you want to make the :s command in your question work, just active the very magic:
:s/\v\|/\\|/g
Vim does the opposite of PCRE in this regard: | is a literal pipe character, with \| serving as the alternation operator. I couldn't find an appropriate escape sequence because the pipe character does not need to be escaped.
The following command works for the line in my example:
:. s%|%\\|%g
If you use very-magic (use \v) you'll have the Perl/pcre behaviour on most special characters (excl. the vim specifics):
:s#\v\|#\\|#g

How to highlight words beginning with ‘#’ in Vim syntax?

I have a very simple Vim syntax file for personal notes. I would like to highlight people's name and I chose a Twitter-like syntax #jonathan.
I tried:
syntax match notesPerson "\<#\S\+"
To mean: words beginning with # and having at least one non-whitespace character. The problem is that # seems to be a special character in Vim regular expressions.
I tried to escape \# and enclose in brackets [#], the usual tricks, but that didn't work. I could try something like (^|\s) (beginning of line or whitespace) but that's exactly the problem that word-boundary tries to solve.
Highlighting works on simplified regular expressions, so this is more a question of finding the right regex than anything else. What am I missing?
# is a special character only if you have enabled the “very magic”
mode by having \v somewhere in the pattern prior to that #.
You have another problem here: # does not start a new word. \< is
not just “word boundary” like perl/PCRE’s \b, but “left word
boundary” (in help: “beginning of the word”) meaning that \< must be
followed by some keyword character. As # is not normally a keyword
character, pattern \<# will never match. (And even if it was like
\b, it would match constructs like abc#def which is definitely not
what you want for the aforementioned reasons.)
You should use \k\#<!#\k\S* instead: \k\#<! ensures that # is not preceded by any keyword character, \k\S* makes sure that first character of the name is a keyword one (you could probably also use #\<\S\+).
There is another solution: include # into 'iskeyword' option and leave the regex as is:
:setlocal iskeyword+=#-#
See :help 'isfname' for the explanation why #-# is used here.
(The 'iskeyword' option has exactly the same syntax and will,
in fact, redirect you there for the explanation.)

What's the difference between vim regex and normal regex?

I noticed that vim's substitute regex is a bit different from other regexp. What's the difference between them?
"Regular expression" really defines algorithms, not a syntax. What that means is that different flavours of regular expressions will use different characters to mean the same thing; or they'll prefix some special characters with backslashes where others don't. They'll typically still work in the same way.
Once upon a time, POSIX defined the Basic Regular Expression syntax (BRE), which Vim largely follows. Very soon afterwards, an Extended Regular Expression (ERE) syntax proposal was also released. The main difference between the two is that BRE tends to treat more characters as literals - an "a" is an "a", but also a "(" is a "(", not a special character - and so involves more backslashes to give them "special" meaning.
The discussion of complex differences between Vim and Perl on a separate comment here is useful, but it's also worth mentioning a few of the simpler ways in which Vim regexes differ from the "accepted" norm (by which you probably mean Perl.) As mentioned above, they mostly differ in their use of a preceding backslash.
Here are some obvious examples:
Perl Vim Explanation
---------------------------
x? x\= Match 0 or 1 of x
x+ x\+ Match 1 or more of x
(xyz) \(xyz\) Use brackets to group matches
x{n,m} x\{n,m} Match n to m of x
x*? x\{-} Match 0 or 1 of x, non-greedy
x+? x\{-1,} Match 1 or more of x, non-greedy
\b \< \> Word boundaries
$n \n Backreferences for previously grouped matches
That gives you a flavour of the most important differences. But if you're doing anything more complicated than the basics, I suggest you always assume that Vim-regex is going to be different from Perl-regex or Javascript-regex and consult something like the Vim Regex website.
If by "normal regex" you mean Perl-Compatible Regular Expressions (PCRE), then the Vim help provides a good summary of the differences between Vim's regexes and Perl's:
:help perl-patterns
Here's what it says as of Vim 7.2:
9. Compare with Perl patterns *perl-patterns*
Vim's regexes are most similar to Perl's, in terms of what you can do. The
difference between them is mostly just notation; here's a summary of where
they differ:
Capability in Vimspeak in Perlspeak ~
----------------------------------------------------------------
force case insensitivity \c (?i)
force case sensitivity \C (?-i)
backref-less grouping \%(atom\) (?:atom)
conservative quantifiers \{-n,m} *?, +?, ??, {}?
0-width match atom\#= (?=atom)
0-width non-match atom\#! (?!atom)
0-width preceding match atom\#<= (?<=atom)
0-width preceding non-match atom\#<! (?!atom)
match without retry atom\#> (?>atom)
Vim and Perl handle newline characters inside a string a bit differently:
In Perl, ^ and $ only match at the very beginning and end of the text,
by default, but you can set the 'm' flag, which lets them match at
embedded newlines as well. You can also set the 's' flag, which causes
a . to match newlines as well. (Both these flags can be changed inside
a pattern using the same syntax used for the i flag above, BTW.)
On the other hand, Vim's ^ and $ always match at embedded newlines, and
you get two separate atoms, \%^ and \%$, which only match at the very
start and end of the text, respectively. Vim solves the second problem
by giving you the \_ "modifier": put it in front of a . or a character
class, and they will match newlines as well.
Finally, these constructs are unique to Perl:
- execution of arbitrary code in the regex: (?{perl code})
- conditional expressions: (?(condition)true-expr|false-expr)
...and these are unique to Vim:
- changing the magic-ness of a pattern: \v \V \m \M
(very useful for avoiding backslashitis)
- sequence of optionally matching atoms: \%[atoms]
- \& (which is to \| what "and" is to "or"; it forces several branches
to match at one spot)
- matching lines/columns by number: \%5l \%5c \%5v
- setting the start and end of the match: \zs \ze
Try Vim's very magic regex mode. It behaves more like traditional regex, just prepend your pattern with \v. See :help /\v for more info. I love it.
There is a plugin called eregex.vim which translates from PCRE (Perl-compatible regular expressions) to Vim's syntax. It takes over a thousand lines of vim to achieve that translation! I guess it also serves as precise documentation of the differences.
Too broad question. Run vim and type :help pattern.

regex implementation to replace group with its lowercase version

Is there any implementation of regex that allow to replace group in regex with lowercase version of it?
If your regex version supports it, you can use \L, like so in a POSIX shell:
sed -r 's/(^.*)/\L\1/'
In Perl, you can do:
$string =~ s/(some_regex)/lc($1)/ge;
The /e option causes the replacement expression to be interpreted as Perl code to be evaluated, whose return value is used as the final replacement value. lc($x) returns the lowercased version of $x. (Not sure but I assume lc() will handle international characters correctly in recent Perl versions.)
/g means match globally. Omit the g if you only want a single replacement.
If you're using an editor like SublimeText or TextMate1, there's a good chance you may use
\L$1
as your replacement, where $1 refers to something from the regular expression that you put parentheses around. For example2, here's something I used to downcase field names in some SQL, getting everything to the right of the 'as' at the end of any given line. First the "find" regular expression:
(as|AS) ([A-Za-z_]+)\s*,$
and then the replacement expression:
$1 '\L$2',
If you use Vim (or presumably gvim), then you'll want to use \L\1 instead of \L$1, but there's another wrinkle that you'll need to be aware of: Vim reverses the syntax between literal parenthesis characters and escaped parenthesis characters. So to designate a part of the regular expression to be included in the replacement ("captured"), you'll use \( at the beginning and \) at the end. Think of \ as—instead of escaping a special character to make it a literal—marking the beginning of a special character (as with \s, \w, \b and so forth). So it may seem odd if you're not used to it, but it is actually perfectly logical if you think of it in the Vim way.
1 I've tested this in both TextMate and SublimeText and it works as-is, but some editors use \1 instead of $1. Try both and see which your editor uses.
2 I just pulled this regex out of my history. I always tweak regexen while using them, and I can't promise this the final version, so I'm not suggesting it's fit for the purpose described, and especially not with SQL formatted differently from the SQL I was working on, just that it's a specific example of downcasing in regular expressions. YMMV. UAYOR.
Several answers have noted the use of \L. However, \E is also worth knowing about if you use \L.
\L converts everything up to the next \U or \E to lowercase. ... \E turns off case conversion.
(Source: https://www.regular-expressions.info/replacecase.html )
So, suppose you wanted to use rename to lowercase part of some file names like this:
artist_-_album_-_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
artist_-_album_-_Another_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
you could do something like:
rename -v 's/^(.*_-_)(.*)(_-_.*.m4a)/$1\L$2\E$3/g' *
In Perl, there's
$string =~ tr/[A-Z]/[a-z]/;
Most Regex implementations allow you to pass a callback function when doing a replace, hence you can simply return a lowercase version of the match from the callback.