Is it possible to use regex for matching with a condition? - regex

I posted this question on super user and I was suggested to post this question on stackoverflow.
I really like vim and today I faced with the intresting problem and I think it can be done via regexp but I can't form out proper one.
I've got a very big sql-file. It consolidates many different queries. File has content with something like this:
select * from hr.employees, oe.orders, oe.order_items
select * from hr.employess, oe.orders, hr.job_history
select * from oe.customers, oe.orders, hr.employees
select * from hr.employees, hr.departments, hr.locations
How can I select only that lines, which has only one match with hr. on the line?. For example above it will be first and third lines.

Sure, it is possible to match such lines. This pattern matches:
^\%(\%(hr\.\)\#!.\)*hr\.\%(\%(hr\.\)\#!.\)*$
Some people like to reduce the amount of backslash-escaping by using the very magic switch \v. Then the same pattern becomes
\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$
(Here I used non-capturing parentheses \%(...\) but capturing parentheses \(...\) would work just as well.)
The question is: What do you want to do with these lines? Delete them?
In that case you could use the :global command:
:g/\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$/d
More information at
:h :global
:h /\v
:h /\%(
:h /\#!

To check if line contains only single occurrence of hr. use regex pattern
^(?=.*\bhr\.)(?!.*\bhr\..*\bhr\.).* with m modifier. I suggest to use grep -P utility.

For that, you need to combine a negative lookbehind with a negative lookahead assertion; i.e. the currently matching pattern must not match before nor afterwards in the same line. In Vim, the atoms for those are \#<! and \#!, respectively.
The pattern to find a single occurrence of X is therefore this:
/\%(X.*\)\#<!X\%(.*X\)\#!/
Applied to your pattern hr.:
/\%(hr\..*\)\#<!hr\.\%(.*hr\.\)\#!/

Problems like this I always find easier to split it into multiple uses of the regex. If I were going to filter through this with grep, I'd do this:
grep "hr\." foo.sql
and that gives me all the lines with "hr.", even the ones with two.
Now I pipe that output through grep again and ask for it to ignore lines with hr. appearing twice:
grep "hr\." foo.sql | grep -v "hr\..*hr\."
I know you're talking about vim, but I'm showing alternatives that might be helpful and might be a good deal clearer.

Related

vim search replace on some specific lines

Vim substitute command :%s/old/new/g will replace all occurrences of old by new. But I want to do this replacement only in lines that do not start with #. I mean in my file there are some lines starting with # (called commented lines) which I want to exclude in search replace. Is there any way to do it?
You should mix it with the g (see help :global) command:
:%g/^[^#]/s/old/new/
:v/^#/s/old/new
should do the job. check :h :v for details.
The Answer from Thomas is fine. Another approach would be the negative lookbehind:
:%s/\(#.*\)\#<!old/new/g
These method matches every old not preceded by a # while the method of Thomas matches every line not starting with # and then matches every old on that line.
The solution of Kent is a little bit more elegant. It uses :v which is the same as :g!. It matches every line starting with # and then runs the command on every other line. This has the advantage that the regex gets easier. In your usecase the regex to not match it is quite easy, but often it is way easier to build a regex to match a criteria than one that doesn't match. so the :v command helps here.
In your case the global (both of the other answers are using global) command seems safer, since it only checks for a # at the beginning of the line. But depending on the usecase all of the three methods have their advantages

Vim - sed like labels or replacing only within pattern

On the basis of some html editing I've came up with need for help from some VIM master out there.
I wan't to achieve simple task - I have html file with mangled urls.
Just description Just description
...
Just description
Unfortunately it's not "one url per line".
I am aware of three approaches:
I would like to be able to replace only within '"http://[^"]*"' regex (similar like replace only in matching lines - but this time not whole lines but only matching pattern should be involved)
Or use sed-like labels - I can do this task with sed -e :a -e 's#\("http://[^" ]*\) \([^"]*"\)#\1_\2#g;ta'
Also I know that there is something like "\#<=" but I am non native speaker and vim manual on this is beyond my comprehension.
All help is greatly appreciated.
If possible I would like to know answer on all three problems (as those are pretty interesting and would be helpful in other tasks) but either of those will do.
Re: 1. You can replace recursively by combining vim's "evaluate replacement as an expression" feature (:h :s\=) with the substitute function (:h substitute()):
:%s!"http://[^"]*"!\=substitute(submatch(0), ' ', '_', 'g')!g
Re: 2. I don't know sed so I can't help you with that.
Re: 3. I don't see how \#<= would help here. As for what it does: It's equivalent to Perl's (?<=...) feature, also known as "positive look-behind". You can read it as "if preceded by":
:%s/\%(foo\)\#<=bar/X/g
"Replace bar by X if it's preceded by foo", i.e. turn every foobar into fooX (the \%( ... \) are just for grouping here). In Perl you'd write this as:
s/(?<=foo)bar/X/g;
More examples and explanation can be found in perldoc perlretut.
I think what you want to do is to replace all spaces in your http:// url into _.
To achieve the goal, #melpomene's solution is straightforward. You could try it in your vim.
On the other hand, if you want to simulate your sed line, you could try followings.
:let #s=':%s#\("http://[^" ]*\)\#<= #_#g^M'
^M means Ctrl-V then Enter
then
200#s
this works in same way as your sed line (label, do replace, back to label...) and #<= was used as well.
one problem is, in this way, vim cannot detect when all match-patterns were replaced. Therefore a relative big number (200 in my example) was given. And in the end an error msg "E486: Pattern not found..." shows.
A script is needed to avoid the message.

(g)vim replace regex

I'm looking for a regex that will change sth. like this:
print "testcode $testvar \n";
in
printnlog("testcode $testvar \n");
I tried %s/print\s*(.\{-});/printnlog(\1);/g but gvim says
print\s*(.\{-});
doesn't match.
Where is my fault?
Is it ok to use '*' after '\s' because later '{-};' will stop the greed?
Thanks in advance.
In vim you have to prepend (, ) and | with backslash, so try
:%s/print\s*\(.\{-}\);/printnlog(\1);/g
MBO's answer works great, but sometimes I find it easier to use the "very magic" option \v so I don't have to escape everything; makes the regex a little more readable.
See also:
:h /\v in Vim
http://briancarper.net/blog/vim-regexes-are-awesome
While you can create capture groups (like you're doing), I think the easiest approach is to do the job in multiple steps, with very simple regexes and "flag" words. For example:
:%s/print "testcode.*/printnlog(XXX&XXX);/
:%s/XXXprint //
:%s/;XXX//
In these examples, I use "XXX" to indicate boundaries that should later be trimmed (you can use anything that doesn't appear in your code). The ampersand (&) takes the entire match string and inserts it into the replacement string.
I don't know about other people, but I can type and execute these three regexes faster than I can think through a capture group.
Is this sufficient for your needs?
%s/print\s*\("[^"]*"\)/printnlog(\1)

Multi-line regex support in Vim

I notice the standard regex syntax for matching across multiple lines is to use /s, like so:
This is\nsome text
/This.*text/s
This works in Perl for instance but doesn't seem to be supported in Vim. Instead, I have to be much more specific:
/This[^\r\n]*[\r\n]*text/
I can't find any reason for why this should be, so I'm thinking I probably just missed the relevant bits in the vim help.
Can anyone confirm this behaviour one way or the other?
Yes, Perl's //s modifier isn't available on Vim regexes. See :h perl-patterns for details and a list of other differences between Vim and Perl regexes.
Instead you can use \_., which means "match any single character including newline". It's a bit shorter than what you have. See :h /\_..
/This\_.*text/

Need a regex to exclude certain strings

I'm trying to get a regex that will match:
somefile_1.txt
somefile_2.txt
somefile_{anything}.txt
but not match:
somefile_16.txt
I tried
somefile_[^(16)].txt
with no luck (it includes even the "16" record)
Some regex libraries allow lookahead:
somefile(?!16\.txt$).*?\.txt
Otherwise, you can still use multiple character classes:
somefile([^1].|1[^6]|.|.{3,})\.txt
or, to achieve maximum portability:
somefile([^1].|1[^6]|.|....*)\.txt
[^(16)] means: Match any character but braces, 1, and 6.
The best solution has already been mentioned:
somefile_(?!16\.txt$).*\.txt
This works, and is greedy enough to take anything coming at it on the same line. If you know, however, that you want a valid file name, I'd suggest also limiting invalid characters:
somefile_(?!16)[^?%*:|"<>]*\.txt
If you're working with a regex engine that does not support lookahead, you'll have to consider how to make up that !16. You can split files into two groups, those that start with 1, and aren't followed by 6, and those that start with anything else:
somefile_(1[^6]|[^1]).*\.txt
If you want to allow somefile_16_stuff.txt but NOT somefile_16.txt, these regexes above are not enough. You'll need to set your limit differently:
somefile_(16.|1[^6]|[^1]).*\.txt
Combine this all, and you end up with two possibilities, one which blocks out the single instance (somefile_16.txt), and one which blocks out all families (somefile_16*.txt). I personally think you prefer the first one:
somefile_((16[^?%*:|"<>]|1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt
somefile_((1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt
In the version without removing special characters so it's easier to read:
somefile_((16.|1[^6]|[^1).*|1)\.txt
somefile_((1[^6]|[^1]).*|1)\.txt
To obey strictly to your specification and be picky, you should rather use:
^somefile_(?!16\.txt$).*\.txt$
so that somefile_1666.txt which is {anything} can be matched ;)
but sometimes it is just more readable to use...:
ls | grep -e 'somefile_.*\.txt' | grep -v -e 'somefile_16\.txt'
somefile_(?!16).*\.txt
(?!16) means: Assert that it is impossible to match the regex "16" starting at that position.
Sometimes it's just easier to use two regular expressions. First look for everything you want, then ignore everything you don't. I do this all the time on the command line where I pipe a regex that gets a superset into another regex that ignores stuff I don't want.
If the goal is to get the job done rather than find the perfect regex, consider that approach. It's often much easier to write and understand than a regex that makes use of exotic features.
Without using lookahead
somefile_(|.|[^1].+|10|11|12|13|14|15|17|18|19|.{3,}).txt
Read it like: somefile_ followed by either:
nothing.
one character.
any one character except 1 and followed by any other characters.
three or more characters.
either 10 .. 19 note that 16 has been left out.
and finally followed by .txt.