PCRE Regular expression : only one matching - regex

I want to catch strings which respond to a pattern in a subject string.
Patterns examples: ##name##, ##address##, ##bankAccount##, ...
Subject example: This is the template with patterns : ##name##Your bank account is : ##bankAccount##Your address is : ##address##
With the following regex: .*(#{2}[a-zA-Z]*#{2}).*, only the last pattern is matched.
How to capture all the patterns, not just the last or first ?

Now that I've formatted your regex properly, the problem shows. A * in your regex was hidden since markdown took it to make the text italics.
Your opening .* matches greedily as much as it can, only backing up enough to let (#{2}[a-zA-Z]*#{2}) match. This matches the last pattern found in the line, everything before it having been matched by the .*.

You need to remove .* as I mentioned in my comment, and use preg_match_all:
$re = '~#{2}[a-zA-Z]*#{2}~';
preg_match_all($re, "##name##, ##address##, ##bankAccount##", $m);
print_r($m);
See the PHP demo
The .*#{2}[a-zA-Z]*#{2}.* matched 0 or more characters other than a newline at first, grabbing the whole line, and then backtracked until the last occurrence of #{2}[a-zA-Z]*#{2} pattern, and the last .* only grabbed the rest of the line. Removing the .* and using preg_match_all, all substrings matching the #{2}[a-zA-Z]*#{2} pattern can be extracted.

Related

Powershell Drop the last part of a string with multiple "."

I'm trying to do a regex expression in powershell to get only a specific part of a string. I know a way I can do this without regex but it can definitely be more efficient with. I have a string that looks like this:
Some/Stuff/Here/Then.drop.last
Ideally, I want to write a regex that gets me just:
Then.Drop
PS> 'Some/Stuff/Here/Then.drop.last' -replace '.*/(.+)\..*', '$1'
Then.drop
.*/ greedily matches everything up to the last /
(.+)\. greedily matches everything up to the last literal . and captures everything before that . in the first capture group ($1) - which is your string of interest.
.* matches the remaining part of the string.
Using $1 as the replacement string then replaces the overall match - the entire input string - with what the first capture group matched.
For more information about PowerShell's -replace operator, see this answer.

Python 2: Regex to get text anywhere between two strings

I am trying to find a regex to get the text between Explanation One: and Explanation Two:
Trick is that text may or may not exist, it could be in the same line as Explanation One or it could be in next line of Explanation One. Current regex in the below code, adds an additional line after it finds the text before Explanation Two:
Any pointers appreciated to just get the text ignoring additional empty lines.
import re
STRING="""Explanation One:
Blah Blah
Explanation Two: ndnlnlkn
"""
pattern = r'Explanation One:[\r\n ].*(?=Explanation Two:)+')'
regex = re.compile(pattern, re.IGNORECASE)
print regex.search(STRING).group()
Output:
Explanation One:
Blah Blah
To match the text between Explanation One: and Explanation Two: you could capture it in a group using the DOTALL flag or use an inline modifier (?s) to make the dot match a newline.
Explanation One:\s*(.*?)\s*Explanation Two
Explanation
Explanation One: Match literally
\s* Match zero or times a whitespace character
(.*?) Capture in a group any character zero or more time non greedy
\s* Match zero or times a whitespace character
Explanation Two Match literally
Regex demo
Demo Python
The problem with your current approach is that mode in which you are performing your regex is not DOT ALL mode. This means that .* will not match across lines, which is precisely what you want it to do, until reaching the Explanation Two: marker text. One way around this is to match the following:
[\s\S]*
This will match anything, whitespace or non whitespace, meaning it will match everything even across lines.
pattern = r'Explanation One:([\s\S]*)(?=Explanation Two:)'
searchObj = re.search(pattern, STRING, re.M|re.I)
print searchObj.group(1)
Blah Blah
Demo
By the way, an alternative would be to leave your current pattern as is, and add the re.DOTALL flag to re.search call. So the following should also work:
pattern = r'Explanation One:(.*)(?=Explanation Two:)'
searchObj = re.search(pattern, STRING, re.M|re.I|re.DOTALL)
print searchObj.group(1)

Regex detect if a matched comma(,) does not lie in a regex

I am trying to figure out a way to determine if my matched comma(,) does not lie inside a regex. Basically, i do not want to match my character if it lies in a regex.
The regex i have come up with is ,(?<!.+\/)(?!.+\/) but its not quite working.
Any ideas?
I want to skip /some,regex/ but match any other commas.
Edit:
Live example: http://rubular.com/r/WjrwSnmzyP
Here is the regex that will work for you:
,(?!\s)(?=(?:(?:[^/]*\/){2})*[^/]*$)
Live Demo: http://rubular.com/r/37buDdg1tW
Explanation: It means match comma followed by EVEN number of forward slash /. Hence comma (,) between 2 slash (/) characters will NOT be matched and outside ones will be matched (since those are followed by even number of / characters).
A curious thing about regular expressions is that if you want to use them to ignore "something" that is within "something else", you need to match that "something else", prefer matches of it, and then either silently discard or reproduce those matches.
For example, in order to remove all commas from a string unless they are in a regular expression literal—
In Perl:
my $s = "/foo,bar/,baz";
$s =~ s{(/(?:[^/\\]|\\.)+/)|,}{\1}g;
In ECMAScript:
var s = "/foo,bar/,baz";
s = s.replace(/(\/([^\/\\]|\\.)+\/)|,/g, "$1");
or
s = s.replace(new RegExp("(/([^/\\\\]|\\\\.)+/)|,", "g"), "$1");
Note that I am capturing the match for the regular expression literal in the string value, and reproducing it (\1 or $1) if it matched. (If the other part of the alternation – the standalone comma – matched, the empty string is captured, so this simple approach suffices here.)
For further reading I recommend “Mastering Regular Expressions” by Jeffrey E. F. Friedl. Two rather enlightening example chapters, each from a different edition, are available for free online.

How does pattern matching work in Perl?

I want to know how pattern matching works in Perl.
My code is:
my $var = "VP KDC T. 20, pgcet. 5, Ch. 415, Refs %50 Annos";
if($var =~ m/(.*)\,(.*)/sgi)
{
print "$1\n$2";
}
I learnt that the first occurrence of comma should be matched. but here the last occurrence is being matched. The output I got is:
VP KDC T. 20, pgcet. 5, Ch. 415
Refs %50 Annos
Can someone please explain me how this matching works?
From docs:
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match
So, first (.*) will take as much as possible.
Simple workaround is using non-greedy quantifier: *?. Or match not every character, but all except comma: ([^,]*).
Greedy and Ungreedy Matching
Perl regular expressions normally match the longest string possible.
For instance:
my($text) = "mississippi";
$text =~ m/(i.*s)/;
print $1 . "\n";
Run the preceding code, and here's what you get:
ississ
It matches the first i, the last s, and everything in between them. But what if you want to match the first i to the s most closely following it? Use this code:
my($text) = "mississippi";
$text =~ m/(i.*?s)/;
print $1 . "\n";
Now look what the code produces:
is
Clearly, the use of the question mark makes the match ungreedy. But theres another problem in that regular expressions always try to match as early as possible.
Source: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
Use question mark in your regex:
if($var =~ m/(.*?)\,(.*)/sgi)
{
print "$1\n$2";
}
So:
(.*)\, means: "match as much characters as you can as long as there will be a comma after them"
(.*?)\, means: "match any characters until you stumble upon a comma"
(.*)\, -you might expect that it will match till the first comma.
But it is greedy enough to match all the xcharacters it came across untill last comma instead of the first comma.
so
it matches till the last command.
and the second match is the rest of the line.
to avoid greedy pattern match adda ? after *

Ungreedy regexp in Perl

I'm trying to capture a string that is like this:
document.all._NameOfTag_ != null ;
How can I capture the substring:
document.all._NameOfTag_
and the tag name:
_NameOfTag_
My attempt so far:
if($_line_ =~ m/document\.all\.(.*?).*/)
{
}
but it's always greedy and captures _NameOfTag_ != null
The lazy (.*?) will always match nothing, because the following greedy .* will always match everything.
You need to be more specific:
if($_line_ =~ m/document\.all\.(\w+)/)
will only match alphanumeric characters after document.all.
Your problem is the lazy quantifier. A lazy quantifier will always first try and rescind matching to the next component in the regex and will consume the text only if said next component does not match.
But here your next component is .*, and .* matches everything until the end of your input.
Use this instead:
if ($_line_ =~ m/document\.all\.(\w+)/)
And also note that it is NOT required that all the text be matched. A regex needs only to match what it has to match, and nothing else.
Try the following instead, personal I find it much clearer:
document\.all\.([^ ]*)