string capture after pattern match

string capture after pattern match - regex

my $s = '>P1;MOREWORDS';
if ($s =~ m/^>.{2};.*/) {
print "jjjjj\n";
my $or = $s =~ /^>.{2};(.*)/;
}
When I try to print $or, I get 1, instead of of MOREWORDS
I am trying to capture using (.), but failing to do so.
It correctly prints jjjjjj after the match

Match returns a boolean in scalar context. Force list context to make it return the captured strings:
my ($or) = $s =~ /^>.{2};(.*)/;

Related

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.

There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).

The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list

I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output

To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Group of Regular expression matches, return nth match and assign to array

$content = "<p><strong>haha</p> .. .. .. <p><strong> hihi </p>";
my #eachTopic = ($content =~ /(<p><strong>)(.*?)(<\/p>)/g);
#I only want to capture the $2 and add it to array
print $_."\n" foreach(#eachTopic);
The resulting array should have 2 values $eachTopic[0] = "haha" and $eachTopic[1] = "hihi".
I know i could do this using for loop and using a if statement, but just wondering, is there anything in regular expression that i can do something like
#arr = ($content =~ /(x)(.*?)(x)/g$2/)

Instead of trying to do this with a match group, use zero-width assertions to match what comes before and after the part you want to match. That way you're matching only that part, and a global match will return an array of all occurences.
my #eachTopic = ($content =~ /(?<=<p><strong>).*?(?=<\/p>)/g);

Have you tried split?
like so:
my #topics = map{ $_ =~ s/<\/p>$//; $_ }split( /<p><strong>/, $txt );
This of course fails if your string contains something other than <p><strong>...</p> at some point...

How to get the expression of matching capture in Perl

In Perl, how can I get the expression of a capture that has matched in a regex?
$s = 'aaazzz';
$s =~ s/(a+)|(b+)|(c+)/.../;
$s =~ s/(?<one>a+)|(?<two>b+)|(?<three>c+)/.../;
I mean the expression (e.g. a+), not the string aaa.
I need the expression of both numbered and named captures.

I'd do something like:
use strict;
use warnings;
my #regexes = (
qr/(a+)/,
qr/(b+)/,
qr/(c+)/,
);
my $string = 'aaazzz';
foreach my $re(#regexes) {
if ($string =~ $re) {
print "Used regex is $re\n";
}
}
Output:
Used regex is (?^:(a+))

You could assemble your regex from components and then test which groups matched. For demo purposes, I have only used match and not match and the replace operator 's', but same principle applies.
$s = 'aaazzz';
$part1 = '(a+)';
if ( $s =~ /$part1|(b+)|(c+)/ ) {
if ($1) {
print("$part1 matched\n");
}
else {
print("$part1 did not match\n");
}
}

How to refer to matched part in regex

I am using the following code to search for a substring and print it out with a few characters before and after it. Somehow Perl takes issue with me using $1 and complains about
Use of uninitialized value $1 in concatenation (.) or string.
I cannot figure out why...can you?
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}

You have to capture $word into regex group $1 by using parentheses,
while ($lines =~ m/\b($word)\b/g)

When you use $1, you are asking the code to use the first captured group from the regex and since your regex doesn't have any, well, that variable won't exist.
You can either refer to the whole match with $& or you add a capture group to your regex and keep using $1.
i.e. Either:
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$&";
print substr ($lines, max(pos($lines)-length($&)-$context, 0), length($&)+$context); # check: am I possibly violating any boundaries here
}
Or
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/(\b$word\b)/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
Note: It doesn't matter whether you use (\b$word\b) or (\b$word)\b or \b($word\b) or \b($word)\b here because \b is a 'string' of 0 length.

When you want to address a matched part in regex, put it in parenthes. Than you'll be able to address this mathced part via $1 variable (for first pair of parenthes), $2 (for the second pair) and so on.

The values $1, $2 and so on hold the strings found by capture groups. When a match is performed all of these variables are set to undef. The code in the question does not have any capture groups and hence $1 is never given a value, it is undefined.
Running the code below shows the effect. Initially $1, $2 and $3 are not defined. The first match sets $1 and $2 but not $3. The second match sets only $1 but not that $2 is cleared to be undefined. The third match has no capture groups and all three are undefined.
use strict;
use warnings;
sub show
{
printf "\$1: %s\n", (defined $1 ? $1 : "-undef-");
printf "\$2: %s\n", (defined $2 ? $2 : "-undef-");
printf "\$3: %s\n", (defined $3 ? $3 : "-undef-");
print "\n";
}
my $text = "abcdefghij";
show();
$text =~ m/ab(cd)ef(gh)ij/; # First match
show();
$text =~ m/ab(cd)efghij/; # Second match
show();
$text =~ m/abcdefghij/; # Third match
show();

$1 will have no value unless you are actually capturing something.
You can adjust your boundary collection method to using lookahead and lookbehinds.
use strict;
use warnings;
my $lines = "this is just a test to find something out";
my $word = "test";
my $extra = 10;
while ($lines =~ m/(?:(?<=(.{$extra}))|(.{0,$extra}))\b(\Q$word\E)\b(?=(.{0,$extra}))/gs ) {
my $pre = $1 // $2;
my $word = $3;
my $post = $4;
print "'...$pre<$word>$post...'\n";
}
Outputs:
'...is just a <test> to find s...'

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

What I mean is:
For example, a{3,} will match 'a' at least three times greedly. It may find five times, 10 times, etc. I need this number. I need this number for the rest of the code.
I can do the rest less efficiently without knowing it, but I thought maybe Perl has some built-in variable to give this number or is there some trick to get it?

Just capture it and use length.
if (/(a{3,})/) {
print length($1), "\n";
}

Use #LAST_MATCH_END and #LAST_MATCH_START
my $str = 'jlkjmkaaaaaamlmk';
$str =~ /a{3,}/;
say $+[0]-$-[0];
Output:
6
NB: This will work only with a one-character pattern.

Here's an idea (maybe this is what you already had?) assuming the pattern you're interested in counting has multiple characters and variable length:
capture the substring which matches the pattern{3,} subpattern
then match the captured substring globally against pattern (note the absence of the quantifier), and force a list context on =~ to get the number of matches.
Here's a sample code to illustrate this (where $patt is the subpattern you're interested in counting)
my $str = "some catbratmatrattatblat thing";
my $patt = qr/b?.at/;
if ($str =~ /some ((?:$patt){3,}) thing/) {
my $count = () = $1 =~ /$patt/g;
print $count;
...
}
Another (admittedly somewhat trivial) example with 2 subpatterns
my $str = "some catbratmatrattatblat thing 11,33,446,70900,";
my $patt1 = qr/b?.at/;
my $patt2 = qr/\d+,/;
if ($str =~ /some ((?:$patt1){3,}) thing ((?:$patt2){2,})/) {
my ($substr1, $substr2) = ($1, $2);
my $count1 = () = $substr1 =~ /$patt1/g;
my $count2 = () = $substr2 =~ /$patt2/g;
say "count1: " . $count1;
say "count2: " . $count2;
}
Limitation(s) of this approach:
Fails miserably with lookarounds. See amon's example.

If you have a pattern of type /AB{n,}/ where A and B are complex patterns, we can split the regex into multiple pieces:
my $string = "ABABBBB";
my $n = 3;
my $count = 0;
TRY:
while ($string =~ /A/gc) {
my $pos = pos $string; # remember position for manual backtracking
$count++ while $string =~ /\GB/g;
if ($count < $n) {
$count = 0;
pos($string) = $pos; # restore previous position
} else {
last TRY;
}
}
say $count;
Output: 4
However, embedding code into the regex to do the counting may be more desirable, as it is more general:
my $string = "ABABBBB";
my $count;
$string =~ /A(?{ $count = 0 })(?:B(?{ $count++ })){3,}/ and say $count;
Output: 4.
The downside is that this code won't run on older perls. (Code was tested on v14 & v16).
Edit: The first solution will fail if the B pattern backtracks, e.g. $B = qr/BB?/. That pattern should match the ABABBBB string three times, but the strategy will only let it match two times. The solution using embedded code allows proper backtracking.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

string capture after pattern match - regex

my $s = '>P1;MOREWORDS'; if ($s =~ m/^>.{2};./) { print "jjjjj\n"; my $or = $s =~ /^>.{2};(.)/; } When I try to print $or, I get 1, instead of of MOREWORDS I am trying to capture using (.), but failing to do so. It correctly prints jjjjjj after the match

Match returns a boolean in scalar context. Force list context to make it return the captured strings: my ($or) = $s =~ /^>.{2};(.*)/;

Related

Matching a variable in a string in Perl from the end

Group of Regular expression matches, return nth match and assign to array

How to get the expression of matching capture in Perl

How to refer to matched part in regex

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

string capture after pattern match - regex

my $s = '>P1;MOREWORDS'; if ($s =~ m/^>.{2};.*/) { print "jjjjj\n"; my $or = $s =~ /^>.{2};(.*)/; } When I try to print $or, I get 1, instead of of MOREWORDS I am trying to capture using (.), but failing to do so. It correctly prints jjjjjj after the match

Match returns a boolean in scalar context. Force list context to make it return the captured strings: my ($or) = $s =~ /^>.{2};(.*)/;

Related

Matching a variable in a string in Perl from the end

Group of Regular expression matches, return nth match and assign to array

How to get the expression of matching capture in Perl

How to refer to matched part in regex

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

Categories

Resources

my $s = '>P1;MOREWORDS'; if ($s =~ m/^>.{2};./) { print "jjjjj\n"; my $or = $s =~ /^>.{2};(.)/; } When I try to print $or, I get 1, instead of of MOREWORDS I am trying to capture using (.), but failing to do so. It correctly prints jjjjjj after the match