Multi-line regex support in Vim - regex

I notice the standard regex syntax for matching across multiple lines is to use /s, like so:
This is\nsome text
/This.*text/s
This works in Perl for instance but doesn't seem to be supported in Vim. Instead, I have to be much more specific:
/This[^\r\n]*[\r\n]*text/
I can't find any reason for why this should be, so I'm thinking I probably just missed the relevant bits in the vim help.
Can anyone confirm this behaviour one way or the other?

Yes, Perl's //s modifier isn't available on Vim regexes. See :h perl-patterns for details and a list of other differences between Vim and Perl regexes.
Instead you can use \_., which means "match any single character including newline". It's a bit shorter than what you have. See :h /\_..
/This\_.*text/

Related

how to extract a part of header in Fasta file by using Linux command

I have a Fasta file with unique header,I would like to extract a part of this header by using Regular expression in Unix.
for example My Fasta file start with this header:
>jgi|Penbr2|47586|fgenesh1_pm.1_#_25
and I would like to extract just the last part of this header like:
>fgenesh1_pm.1_#_25
Actually I use this regular expression in vim editor but It did not work:
:%s/^([^|]+\|){3}//g
or
:%s/^([A-Z][0-9]+\|){3}//g
I would be appropriate if give me some suggestion.
You can use sed:
sed -e 's/>.*|/>/' fasta-file
i.e. everything between > and | is replaced by >.
I don't know if the leading > is also a part of your text. Assume that they are not.
Since you tagged with vim, then I just post the vim solution.
You can make the usage of the "greedy" of regex:
In vim:
%s/.*|//
will leave the last part, this is the easiest way.
in vim you can use \zs, \ze and non-greedy matching too:
%s/\zs.\{-}\ze[^|]\+$//
Of course, if you like grouping, you can use \(...\) to group and don't use \zs \ze.
In your codes, you grouped just with (...) without escaping. I don't know how did you configure your magic setting in your vimrc, if you use default, you have to escape the ( and ) to give them special meanings (grouping here). Just like what we do with BRE. Do a :h magic, and find the table to know the difference.
In vim do :h terms to get detail information.

Why would a regex work in Sublime and not in vim?

Tried searching for regex found in this answer:
(,)(?=(?:[^']|'[^']*')*$)
I tried doing a search in Sublime and it worked out (around 700 results). When trying to replace the results it runs out of memory. Tried /(,)(?=(?:[^']|'[^']*')*$) in vim for searching first but it does not find any instances of the pattern. Also tried escaping all the ( and ) with \ in the regex.
Vim uses its own regular expression engine and syntax (which predates PCRE, by the way) so porting a regex from perl or some other editor will most likely need some work.
The many differences are too numerous to list in detail here but :help pattern and :help perl-patterns will help.
Anyway, this quick and dirty rewrite of your regular expression seems to work on the sample given in the linked question:
/\v(,)(\#=([^']|'[^']*')*$)
See :help \#= and :help \v.
One possible explanation is that the regular expression engine used in Sublime is different than the engine used in vim.
Not all regex engines are created equal; they don't all support the same features. (For example, a "negative lookahead" feature can be very powerful, but not all engines support it. And the syntax for some features differs betwen engines.)
A brief comparison of regular expression engines is available here:
http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
Unfortunately Vim uses a different engine, and "normal" regular expressions won't work.
The regex you've mentioned isn't perfect: it doesn't skip escaped quotes, but, as I understand, it's good enough for you. Try this one, and if it doesn't match something, please send me that piece.
\v^([^']|'[^']*')*\zs,
A little explanation:
\v enables very magic search to avoid complex escaping rules
([^']|'[^']*') matches all symbols but quote and a pair of qoutes
\zs indicates the beginning of selection; you can think of it as of a replacement for lookbehind.
You have to escape the |, otherwise it doesn't work under vim. You should also escape the round brackets, unless you are searching for the '(' or ')' characters.
More information on regex usage in vim can be found on vimregex.com.

Is it possible to use regex for matching with a condition?

I posted this question on super user and I was suggested to post this question on stackoverflow.
I really like vim and today I faced with the intresting problem and I think it can be done via regexp but I can't form out proper one.
I've got a very big sql-file. It consolidates many different queries. File has content with something like this:
select * from hr.employees, oe.orders, oe.order_items
select * from hr.employess, oe.orders, hr.job_history
select * from oe.customers, oe.orders, hr.employees
select * from hr.employees, hr.departments, hr.locations
How can I select only that lines, which has only one match with hr. on the line?. For example above it will be first and third lines.
Sure, it is possible to match such lines. This pattern matches:
^\%(\%(hr\.\)\#!.\)*hr\.\%(\%(hr\.\)\#!.\)*$
Some people like to reduce the amount of backslash-escaping by using the very magic switch \v. Then the same pattern becomes
\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$
(Here I used non-capturing parentheses \%(...\) but capturing parentheses \(...\) would work just as well.)
The question is: What do you want to do with these lines? Delete them?
In that case you could use the :global command:
:g/\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$/d
More information at
:h :global
:h /\v
:h /\%(
:h /\#!
To check if line contains only single occurrence of hr. use regex pattern
^(?=.*\bhr\.)(?!.*\bhr\..*\bhr\.).* with m modifier. I suggest to use grep -P utility.
For that, you need to combine a negative lookbehind with a negative lookahead assertion; i.e. the currently matching pattern must not match before nor afterwards in the same line. In Vim, the atoms for those are \#<! and \#!, respectively.
The pattern to find a single occurrence of X is therefore this:
/\%(X.*\)\#<!X\%(.*X\)\#!/
Applied to your pattern hr.:
/\%(hr\..*\)\#<!hr\.\%(.*hr\.\)\#!/
Problems like this I always find easier to split it into multiple uses of the regex. If I were going to filter through this with grep, I'd do this:
grep "hr\." foo.sql
and that gives me all the lines with "hr.", even the ones with two.
Now I pipe that output through grep again and ask for it to ignore lines with hr. appearing twice:
grep "hr\." foo.sql | grep -v "hr\..*hr\."
I know you're talking about vim, but I'm showing alternatives that might be helpful and might be a good deal clearer.

Vim regex to substitute/escape pipe characters

Let's suppose I have a line:
a|b|c
I'd like to run a regex to convert it to:
a\|b\|c
In most regex engines I'm familiar with, something like s%\|%\\|%g should work. If I try this in Vim, I get:
\|a\||\|b\||\|c
As it turns out, I discovered the answer while typing up this question. I'll submit it with my solution, anyway, as I was a bit surprised a search didn't turn up any duplicates.
vim has its own regex syntax. There is a comparison with PCRE in vim help doc (see :help perl-patterns).
except for that, vim has no magic/magic/very magic mode. :h magic to check the table.
by default, vim has magic mode. if you want to make the :s command in your question work, just active the very magic:
:s/\v\|/\\|/g
Vim does the opposite of PCRE in this regard: | is a literal pipe character, with \| serving as the alternation operator. I couldn't find an appropriate escape sequence because the pipe character does not need to be escaped.
The following command works for the line in my example:
:. s%|%\\|%g
If you use very-magic (use \v) you'll have the Perl/pcre behaviour on most special characters (excl. the vim specifics):
:s#\v\|#\\|#g

(g)vim replace regex

I'm looking for a regex that will change sth. like this:
print "testcode $testvar \n";
in
printnlog("testcode $testvar \n");
I tried %s/print\s*(.\{-});/printnlog(\1);/g but gvim says
print\s*(.\{-});
doesn't match.
Where is my fault?
Is it ok to use '*' after '\s' because later '{-};' will stop the greed?
Thanks in advance.
In vim you have to prepend (, ) and | with backslash, so try
:%s/print\s*\(.\{-}\);/printnlog(\1);/g
MBO's answer works great, but sometimes I find it easier to use the "very magic" option \v so I don't have to escape everything; makes the regex a little more readable.
See also:
:h /\v in Vim
http://briancarper.net/blog/vim-regexes-are-awesome
While you can create capture groups (like you're doing), I think the easiest approach is to do the job in multiple steps, with very simple regexes and "flag" words. For example:
:%s/print "testcode.*/printnlog(XXX&XXX);/
:%s/XXXprint //
:%s/;XXX//
In these examples, I use "XXX" to indicate boundaries that should later be trimmed (you can use anything that doesn't appear in your code). The ampersand (&) takes the entire match string and inserts it into the replacement string.
I don't know about other people, but I can type and execute these three regexes faster than I can think through a capture group.
Is this sufficient for your needs?
%s/print\s*\("[^"]*"\)/printnlog(\1)