Matching an expression including arbitrary lines with regex in Vim - regex

In a text file opened with Vim, I'm trying to match the occurrence of two strings, DRIVER_ACTIVITY and DriverGroup, with an arbitrary amount of lines in between:
2013-07-01 05:06:23,801 DRIVER_ACTIVITY
2013-07-01 05:06:23,804 text
2013-07-01 05:06:23,804 more text
2013-07-01 05:06:23,805 DriverGroup
using:
/DRIVER_ACTIVITY(.*)DriverGroup/s
/DRIVER_ACTIVITY((.|\n|\r)*)DriverGroup
/\vDRIVER_ACTIVITY((.|\n|\r)*)DriverGroup
/DRIVER_ACTIVITY\[\S\s\]*DriverGroup
Nothing matches. How do I match all the lines/new lines?

If you want to use the more common (...) for grouping, you need to include the \v atom to switch Vim's regular expression syntax to "very magic"; else, it's \(...\). But for your case, Vim has a special atom that matches arbitrary characters including newlines: \_., like this:
/DRIVER_ACTIVITY\_.*DriverGroup
There's no way around learning Vim's different regular expression dialect; see :help pattern.

The \_s construct searches spaces including newlines
/DRIVER_ACTIVITY\(\_s\|.\)*DriverGroup

Ok, I see the problem. In this sample file, the third try matches, as does Ingo Karkat's and Explosion Pills' suggestions. The reason I didn't succeed is because all these seem to be greedy. That's why none of these matches in "the big file", 'cause it's greedy and keeps on looking, not returning a match in several seconds, though the marker is located on the same line where the first match should appear. So it actually matches but my patience is the problem :)
I made it non greedy and it worked:
/DRIVER_ACTIVITY_.{-}DriverGroup

Related

A regular expression that matches two long strings and ignores everything in between

I am searching through a 1.5 million line Premiere Pro project for any text that matches one of my audio filters and is set to mono.
Text that I am searching for begins with the <ChannelType> tag and ends with the <FilterMatchName>Tags. So it would looks like this
<ChannelType>0</ChannelType>
<FrameRate>5292000</FrameRate>
</AudioComponent>
<FilterPreset>0</FilterPreset>
<OpaqueData Encoding="base64" Checksum="53060659">AAAAAD8L8lo+AUr+Pac1NjwTmoUAAAAAP0uQDD37nIg9ui6MPjwU5j+AAAA+C/JaAAAAAD8qqqsAAAAAP4AAAD92L8w9py8FAAAAAHNvZnQgY29tcHJlc3Npb24AIiBkZWZhdWx0PSIwIiBzdGVwPSIxIiBtaW49IjAiIG1heD0iMSIvPgoJICA8Zmw=</OpaqueData>
<FilterIndex>-1</FilterIndex>
<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
If I were in a Word doc, I would just do a find as
<ChannelType>0</ChannelType>*<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
I am terrible with Regex. I was hoping someone could help me out. Everything I have tried either doesn't match anything, or matches EVERYTHING in the document. I am using Notepad++.
Since you are working in Notepad++, you have access to PCRE regular expressions. This one will get all the text between <ChannelType> and </FilterMatchName>
(?s)<ChannelType>.*?</FilterMatchName>
the (?s) allows the . to match newline characters
After matching <ChannelType>, the .*? lazily matches all characters up to...
the closing </FilterMatchName>, which we match.
Let me know if you have any questions. :)
What type of regular expressions are you using (which language/library)?
Basically you can use .* instead of * in regular expressions. IF your text is long though, it's better to use a Reluctant quantifier[1] if your re implementation allows it.
This is a good site with comparison of different re implementations and tutorials:
http://www.regular-expressions.info
[1] http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

match first space on a line using sublime text and regular expressions

So regular expressions have always been tough for me. Im getting frustrated trying to find a regular expression that will select the first white space on a line. So then i can use sublime text to replace that with a /
If you could give a quick explanation that would help to
In the spirit of #edi's answer, but with some explanation of what's happening. Match the beginning of the line with ^, then look for a sequence of characters that are not whitespace with [^\s]* or \S* (the former may work in more editors, libraries, etc than the latter), then find the first whitespace character with \s. Putting these together, you have
^[^\s]*\s
You may want to group the non-whitespace and whitespace parts, so you can do the replacement you're talking about:
^([^\s]*)(\s)
Then the replacement pattern is just \1/
You can use this regex.
^([^\s]*)\s

Regular Expression for formatting a file

My file has data with each line starting with a specific pattern
1000000179|abcd.....
1000000180|wedwedw...
1000000181|wnewedwed...
i've opened the file in visual studio and need an RE to find any line not beginning in the correct sequence. Like below line 3 and 4 are not valid. How to isolate them using RE
1000000179|abcd.....
1000000180|wedwedw...
1000xyadaa|wnewedwed...
%dfgxyadaa|wnewedwed...
Something as simple as ^[^0-9]{1,10}[^|].*$ should detect any lines that don't start with ten numerals and a pipe.
If you just want to select just the first part of the line, then ^[^0-9]{1,10}[^|]
NB: you can replace [^0-9] with \D (case sensitive!) if you prefer that syntax, eg ^\D{1,10}[^|]
To reverse the logic (ie find the correct lines), use ^[0-9]{10}\|.*$ or ^\d{10}\|
EDIT: For VS2005's search/replace "regular expressions":
To find lines that DO NOT start with 10 numerics followed by a pipe: ^~([0-9]^10\|)
To find lines that DO start with 10 numerics followed by a pipe: ^[0-9]^10\|
Note that the \d and \D syntax does not work, as per #KennethK.'s comment below. The equivalent for a single digit ie [0-9] in VS regular expressions is :d.
Refer to http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx for the list of regular expressions available in VS2005.
If I understand what you are trying to find, try the following expression:
^~(1000000).*$
Where the ^, .*, and $ all function as in typical regex, and the ~(...) means "not match". So the overall intent of the pattern is to find lines that do not start with the string "1000000".

Regex Searching in vim

I'm using vim to do some pattern matching on a text file. I've enabled search highlighting so that I know exactly what is getting matched on each search and am getting confused.
Consider searching for [a-z]* on the following text:123456789abcdefghijklmnopqrstuvwxyxz987654321ABCDEFGHIJKLMNOPQRSTUVWQXZ
I expected this search to match zero or more consecutive characters that are in the range [a-z]. Instead, I get a match on the entire line.
Should this be the expected behaviour?
Thanks,
Andrew
It's matching the empty strings that occur after every character. It has no way of highlighting empty ranges, so it looks like everything is highlighted.
Try searching for [a-z]\+ instead.
Empty string matches [a-z]*... therefore this thing is matching everywhere. Perhaps you want to cut down some of the cases by doing [a-z]+ (1 or more), or [a-z]{4,} (4 or more).
You're not getting a match on the entire line, you're getting a match on every character. Your pattern also matches nothing at all, which is matched by every single character.

How can I "inverse match" with regex?

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?
I'm using RegexBuddy, but still having trouble.
(?!Andrea).{6}
Assuming your regexp engine supports negative lookaheads...
...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}
Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.
For Python/Java,
^(.(?!(some text)))*$
http://www.lisnichenko.com/articles/javapython-inverse-regex.html
In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:
^(?:(?!Andrea).)*$
This is called a tempered greedy token. The downside is that it doesn't perform well.
The capabilities and syntax of the regex implementation matter.
You could use look-ahead. Using Python as an example,
import re
not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)
To break that down:
(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then
\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]
\w{6} means exactly six word characters.
re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...
Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.
Negative lookahead assertion
(?!Andrea)
This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.
If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.
On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)
On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.
I just came up with this method which may be hardware intensive but it is working:
You can replace all characters which match the regex by an empty string.
This is a oneliner:
notMatched = re.sub(regex, "", string)
I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.
This will only return you the string result, not any match objects!
(?! is useful in practice. Although strictly speaking, looking ahead is not a regular expression as defined mathematically.
You can write an inverted regular expression manually.
Here is a program to calculate the result automatically.
Its result is machine generated, which is usually much more complex than hand writing one. But the result works.
If you have the possibility to do two regex matches for the inverse and join them together you can use two capturing groups to first capture everything before your regex
^((?!yourRegex).)*
and then capture everything behind your regex
(?<=yourRegex).*
This works for most regexes. One problem I discovered was when I had a quantifier like {2,4} at the end. Then you gotta get creative.
In Perl you can do:
process($line) if ($line =~ !/Andrea/);