PCRE pattern to count number of character(s) - regex

here is my input string:
bar somthing foo bar somthing foo
I would like to count number of a character (ex: 't') between bar & foo
bar somthing foo -> 1
bar somthing foo -> 1
I know we can use /bar(.*?)foo/ and then count number of character in matches[1] with a String function
Is there way to do this w/o string function to count?

A Perl solution:
$_ = 'bar test this thing foo';
my $count = /bar(.*?)foo/ && $1 =~ tr/t//;
print $count;
Output:
4
Just for fun, using a single expression with (?{ code }):
$_ = 'bar test this thing foo';
my $count = 0;
/bar ( (?:(?!bar)[^t])*+ ) (?:t (?{ ++$count; }) (?-1) )*+ foo/x or $count = 0;
print $count;

#Qtax: the subject says PCRE... so it's not exactly perl. Hence (?{ code }) would most probably be unsupported (let alone the full perl code).
Though both solutions are cool ;)
#tqwer: you can get the match and then replace [^t] with "" and check length..
though i am not sure what the logic behind counting with regex would be ;)

Related

perl regex to grep for a character in a word

I am trying to write a perl script to grep for a character in a string. All the strings are stored in an array. We iterate over the array and look if the particular word occurs, if so grep for a particular pattern.
my #array = ("Foo1", "Bar", "Baz", "Foo5", "Foo2", "Bak", "Foo3");
foreach my $ var (#array){
if ($var =~ /Foo/){
#Regex to grep for the number which is at the end of string Foo
}
}
Any leads are welcomed. Thanks for the help.
************Edits***********
Thanks for the comments.
if ($var =~ /Foo/){
/.Foo+([A-Z]+)/;
print $1, "\n";
}
The above is the code that I tried and it didn't print anything.
Matching without binding =~ matches against the $_ variable that you don't use. Furthermore, the dot before Foo means the second regex will match only if Foo is preceded by something (that's not a newline). As all your strings containing Foo start with it, the second regex can never match even if you specify $var =~.
Moreover, you can match the number directly in the condition.
And finally, [A-Z] doesn't match digits. Use [0-9] instead.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
foreach my $var (#array){
if ($var =~ /Foo([0-9]+)/){
print $1, "\n";
}
}
Try this for your second regex
[^\w]*([0-9])
Then you can use the first group to get the number.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
my #var = map(/Foo(\w+)/) #array;
print #var ;

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

What I mean is:
For example, a{3,} will match 'a' at least three times greedly. It may find five times, 10 times, etc. I need this number. I need this number for the rest of the code.
I can do the rest less efficiently without knowing it, but I thought maybe Perl has some built-in variable to give this number or is there some trick to get it?
Just capture it and use length.
if (/(a{3,})/) {
print length($1), "\n";
}
Use #LAST_MATCH_END and #LAST_MATCH_START
my $str = 'jlkjmkaaaaaamlmk';
$str =~ /a{3,}/;
say $+[0]-$-[0];
Output:
6
NB: This will work only with a one-character pattern.
Here's an idea (maybe this is what you already had?) assuming the pattern you're interested in counting has multiple characters and variable length:
capture the substring which matches the pattern{3,} subpattern
then match the captured substring globally against pattern (note the absence of the quantifier), and force a list context on =~ to get the number of matches.
Here's a sample code to illustrate this (where $patt is the subpattern you're interested in counting)
my $str = "some catbratmatrattatblat thing";
my $patt = qr/b?.at/;
if ($str =~ /some ((?:$patt){3,}) thing/) {
my $count = () = $1 =~ /$patt/g;
print $count;
...
}
Another (admittedly somewhat trivial) example with 2 subpatterns
my $str = "some catbratmatrattatblat thing 11,33,446,70900,";
my $patt1 = qr/b?.at/;
my $patt2 = qr/\d+,/;
if ($str =~ /some ((?:$patt1){3,}) thing ((?:$patt2){2,})/) {
my ($substr1, $substr2) = ($1, $2);
my $count1 = () = $substr1 =~ /$patt1/g;
my $count2 = () = $substr2 =~ /$patt2/g;
say "count1: " . $count1;
say "count2: " . $count2;
}
Limitation(s) of this approach:
Fails miserably with lookarounds. See amon's example.
If you have a pattern of type /AB{n,}/ where A and B are complex patterns, we can split the regex into multiple pieces:
my $string = "ABABBBB";
my $n = 3;
my $count = 0;
TRY:
while ($string =~ /A/gc) {
my $pos = pos $string; # remember position for manual backtracking
$count++ while $string =~ /\GB/g;
if ($count < $n) {
$count = 0;
pos($string) = $pos; # restore previous position
} else {
last TRY;
}
}
say $count;
Output: 4
However, embedding code into the regex to do the counting may be more desirable, as it is more general:
my $string = "ABABBBB";
my $count;
$string =~ /A(?{ $count = 0 })(?:B(?{ $count++ })){3,}/ and say $count;
Output: 4.
The downside is that this code won't run on older perls. (Code was tested on v14 & v16).
Edit: The first solution will fail if the B pattern backtracks, e.g. $B = qr/BB?/. That pattern should match the ABABBBB string three times, but the strategy will only let it match two times. The solution using embedded code allows proper backtracking.

Perl pattern matching with pattern arithmetic

I want to understand how can I do arithmetic on matched sub-patterns in perl regex.
This is just a sample code and I want to understand how can I use \1 (already matched sub-pattern. in this case - 7) to match pattern+1 (8)
my $y = 77668;
if($y =~ /(\d)\1(\d)\2\1+1/) #How to increment a previously
#matched sub-pattern and form a pattern?
{
print $y;
}
EDIT
From the answers, I see that pattern arithmetic is not possible.
This is what I want to achieve.
I want to form a regex which will match this pattern:
N-3N-2N-1NNN+1N+2N+3 (N = 3,4,5,6
Its possible via regex code blocks:
my $y = 77668;
if($y =~ /(\d)\1(\d)\2(??{$1+1})/ ) {
print $y;
}
In this snippet (??{ CODE }) returns another regex that must match, so this regex looks like "8" ($1+1). As a result, whole regex will match only if 5th digit is greather and 1st by 1. But drawback with 1st digit is 9, this code block will return "10", so possible its wrong behavior, but you said nothing about what must be done in this case.
Now about N-3N-2N-1NNN+1N+2N+3 question, you can match it with this regex:
my $n = 5;
if( $y =~ /(??{ ($n-3).($n-2).($n-1).$n.($n+1).($n+2).($n+3) })/ ){
Or more "scalable" way:
my $n = 5;
if( $y =~ /(??{ $s=''; $s .= $n+$_ foreach(-3..3); $s; })/ ){
Again, what we must do if $n == 2 ?? $n-3 will be -1. Its not a simply digit cus it have sign, so you should think about this cases.
One another way. Match what we have and then check it.
if( $y =~ /(\d)(\d)(\d)(\d)(\d)(\d)(\d)/ ) {
if( $1 == ($4-3) && $2 == ($4-2) && $3 == ($4-1) && $6 == ($4+1) && $7 == ($4+2) && $7 == ($4+3) ) {
#...
Seems this method litle bit clumsy, but its obivious to everyone (i hope).
Also, you can optimize your regex since 7 ascending digits streak is not so frequent combination, plus get some lulz from co-workers xD:
sub check_number {
my $i;
for($i=1; $i<length($^N); $i++) {
last if substr($^N, $i, 1)<=substr($^N, $i-1, 1);
}
return $i<length($^N) ? "(*FAIL)" : "(*ACCEPT)";
}
if( $y =~ /[0123][1234][2345][3456][4567][5678][6789](??{ check_number() })/ ) {
Or... Maybe most human-friendly method:
if( $y =~ /0123456|1234567|2345678|3456789/ ) {
Seems last variant is bingo xD Its good example about not searching regex when things are so simple)
Of course this is possible. We are talking about Perl regexes after all. But it will be rather ugly:
say "55336"=~m{(\d)\1(\d)\2(\d)(?(?{$1+1==$3})|(*F))}?"match":"fail";
or pretty-printed:
say "55336" =~ m{ (\d)\1 (\d)\2 (\d)
(? (?{$1+1==$3}) # true-branch: nothing
|(*FAIL)
)
}x
? "match" : "fail";
What does this do? We collect the digits in ordinary captures. At the end, we use an if-else pattern:
(? (CONDITION) TRUE | FALSE )
We can embed code into a regex with (?{ code }). The return value of this code can be used as a condition. The (*FAIL) (short: (*F)) verb causes the match to fail. Use (*PRUNE) if you only want a branch, not the whole pattern to fail.
Embedded code is also great for debugging. However, older perls cannot use regexes inside this regex code :-(
So we can match lots of stuff and test it for validity inside the pattern itself. However, it might be a better idea to do that outside of the pattern like:
"string" =~ /regex/ and (conditions)
Now to your main pattern N-3N-2N-1NNN+1N+2N+3 (I hope I parsed it correctly):
my $super_regex = qr{
# N -3 N-2 N-1 N N N+1 N+2 N+3
(\d)-3\1-2\1-1\1\1(\d)(\d)(\d)
(?(?{$1==$2-1 and $1==$3-2 and $1==$4-3})|(*F))
}x;
say "4-34-24-144567" =~ $super_regex ? "match" : "fail";
Or did you mean
my $super_regex = qr{
#N-3 N-2 N-1 N N N+1 N+2 N+3
(\d)(\d)(\d) (\d)\4 (\d)(\d)(\d)
(? (?{$1==$4-3 and $2==$4-2 and $3==$4-1 and
$5==$4+1 and $6==$4+2 and $7==$4+3})|(*F))
}x;
say "123445678" =~ $super_regex ? "match" : "fail";
The scary thing is that these even works (with perl 5.12).
We could also generate parts of the pattern at match-time with the (??{ code }) construct — the return value of this code is used as a pattern:
my $super_regex = qr{(\d)(??{$1+1})(??{$1+2})}x;
say "234"=~$super_regex ? "match":"fail"
et cetera. However, I think readability suffers more this way.
If you need more than nine captures, you can use named captures with the
(?<named>pattern) ... \k<named>
constructs. The contents are also available in the %+ hash, see perlvar for that.
To dive further into the secrets of Perl regexes, I recommend reading perlre a few times.

Regular expressions to match protected separated values

I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.
A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)
Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G
$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilà!
Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F
Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.
$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;

Break from regex loop in Perl

In Perl regex, how can I break from /ge loop..?
Let's say the code is:
s/\G(foo)(bar)(;|$)/{ break if $3 ne ';'; print "$1\n"; '' }/ge;
...break here doesn't work, but it should illustrate what I mean.
Generally, I would write this as a while statement:
while( s/(foo)(bar)/$1/ ) {
# my code to determine if I should stop
if(something) {
last;
}
}
The caveat with this method is that your search/replace will start at the beginning each time, which may matter depending on your regex.
If you really wanted to do it in the regex, you could write a function that returns an unmodified string if you reached your end point, such as a count in this case:
my $count=0;
sub myfunc {
my ($string, $a, $b) = #_;
$count++;
if($count > 3) {
return $string;
}
return $a;
}
$mystring = "foobar foobar, foobar + foobar and foobar";
$mystring =~ s/((foo)(bar))/myfunc($1,$2,$3)/ge;
# result: $mystring => "foo foo, foo + foobar and foobar"
If I knew your specific case, I could probably provide a more helpful example.
You can use some experimental features to emulate a break statement, the Perl documentation for some of these features warn that they may change in future versions of Perl.
my $str = "abcdef";
my $stop = 0;
$str =~ s/(?(?{ $stop })(?!))(.)/ $stop = 1 if $1 ge "c"; "X" /ge;
print "$str\n";
This will print XXXdef.
A piece wise explanation:
(?(condition)yes-pattern) if the pattern in in condition matches then match yes-pattern, otherwise don't match anything.
(?{ code }) execute code, inside a conditional if the code is true execute the yes-pattern
(?!) will always fail to match, it's meaning is something like "Don't match nothing" and since 'nothing' can be matched at any point in a string it will fail.
So when $stop is true the pattern can never match, and when $stop is false it matches.