Capturing Group with regex - regex

I am using the code as follows,
Code:
my $str = 123455;
if ($str =~ m/([a-z]+)|(\d+)/ {
print "$1\n";
}
I know that it will not print the result because we should give $2. But I want to get the result as it is using the same code by changing the regular expression.
Is it possible to do it?
Note :
Please do not provide the result as below,
my $str = 123455;
if ($str =~ m/(?:[a-z]+)|(\d+)/ {
print "$1\n";
}

You can use (?| .. ) for alternative capture group numbering,
use 5.010; # regex feature available since perl 5.10
my $str = 123455;
if ($str =~ m/(?| ([a-z]+)|(\d+) )/x) {
print "$1\n";
}

([a-z]+|\d+)
Try this.Replace by $1.See demo.
http://regex101.com/r/sZ2wJ5/1
Add anchors if you want to match only letters or numbers at a time.
^([a-z]+|\d+)$
or
((?:[a-z]+)|(?:\d+))

You could use print "$&\n".
$& contains the entire matched string (in other words : either $1 or $2).
See http://perldoc.perl.org/perlre.html for more details ;-)

What do you mean you don't want to change your group structuring? You want your capture to go to group 1, but what you have won't ever put a number in group 1. You have to change your group structuring.
If you still want to be able to find a numeric in group 2, you can create subgroups -- groups number from the opening parenthesis. Try
([a-z]+|(\d+))
if that's what you want.

Related

Matching string between first and last parentheses

I need to pick up all texts between the first and last parentheses but am having a hard time with regex.
What I have is this so far and I'm stuck and don't know hot to proceed further.
/(\w+)\((.*?)\)\s/g)
But it stops at the first ")" that it sees.
Sample:
(me)
(mine)
((me) and (you))
Desired output is
me
mine
(me) and (you)
Your code is almost correct, it would worked only if you would not add the ? in the regex, for example: (I have also removed a couple of things)
/\w+\((.*)\)/
Since you want to capture all text inside parenthesis, you shouldn't use non-greedy quantifier. You can use this regex which uses lookarounds and greedy version .* which captures all text in between ( and ).
(?<=\().*(?=\))
Demo
EDIT: Another alternative solution
Another way to extract same data can be done using following regex which doesn't have any look ahead/behind which is not supported by some regex flavors and might be useful in those situations.
^\((.*)\)$
Here ^\( matches the starting bracket and then (.*) consumes any text in a exhaustive manner and places in first grouping pattern and only stops at last occurrence of ) before end of line.
Demo without lookaround
Here's a non-regex solution. Since you want the absolute first and last instances of fixed substrings, index and rindex find the right positions that you can feed to substr:
#!/usr/bin/perl
use v5.10;
while( <DATA> ) {
chomp;
my $start = 1 + index $_, '(';
my $end = rindex $_, ')';
my $s = substr $_, $start, ($end - $start);
say "Read: $_";
say "Extracted: $s";
}
__END__
(me)
(mine)
((me) and (you))
A non-regex way with chop() and reverse()
$string='((me) and (you))';
chop($string);
$string = reverse($string);
chop($string);
$string = reverse($string);
print $string;
Output:
(me) and (you)
DEMO: http://tpcg.io/MhaLed

perl script not extracting only path (regex)

I have this regex expression
($oldpath = $_) =~ m/^\/(.+\/)*/;
This is the input:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/110-lafayette_railroad.mp3
But the output is:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/110-lafayette_railroad.mp3
When it should be:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/
Thanks in advance. :)
What do you mean by "output"? $1 contains
cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/
which is almost what you wanted (it just misses the leading /).
You assigned $_ to $oldpath, than matched it against a regex. It doesn't change either $_ or $oldpath.
The canonical way is
my ($match) = m/^\/(.+\/)*/;
or rather (to prevent the leaning toothpick syndrome)
my ($match) = m{^/(.+/)*};
i.e. running the match in list context returns the matching capture groups, and the first one is assinged to $match.

Close last 4 characters in breaket of php string

I have some strings like below
my-name-is-2547
this-is-stack-2012
hllo-how-2011
Now I want the above strings to be changed to something like the ones below using regex.
my-name-is-(2547)
this-is-stack-(2012)
hllo-how-(2011)
I don't want to use substr or other, only regex replace.
$pattern = '/(\d+)$/';
$replacement = '($1)';
echo preg_replace($pattern, $replacement, $string);
If you are sure that a numbers are only at the end:
regular expression:
(\d+)
using 1 capturing group. Replaced by: ($1).
so the outpu will be:
my-name-is-(2547)
this-is-stack-(2012)
hllo-how-(2011)

How to have a variable as regex in Perl

I think this question is repeated, but searching wasn't helpful for me.
my $pattern = "javascript:window.open\('([^']+)'\);";
$mech->content =~ m/($pattern)/;
print $1;
I want to have an external $pattern in the regular expression. How can I do this? The current one returns:
Use of uninitialized value $1 in print at main.pm line 20.
$1 was empty, so the match did not succeed. I'll make up a constant string in my example of which I know that it will match the pattern.
Declare your regular expression with qr, not as a simple string. Also, you're capturing twice, once in $pattern for the open call's parentheses, once in the m operator for the whole thing, therefore you get two results. Instead of $1, $2 etc. I prefer to assign the results to an array.
my $pattern = qr"javascript:window.open\('([^']+)'\);";
my $content = "javascript:window.open('something');";
my #results = $content =~ m/($pattern)/;
# expression return array
# (
# q{javascript:window.open('something');'},
# 'something'
# )
When I compile that string into a regex, like so:
my $pattern = "javascript:window.open\('([^']+)'\);";
my $regex = qr/$pattern/;
I get just what I think I should get, following regex:
(?-xism:javascript:window.open('([^']+)');)/
Notice that it it is looking for a capture group and not an open paren at the end of 'open'. And in that capture group, the first thing it expects is a single quote. So it will match
javascript:window.open'fum';
but not
javascript:window.open('fum');
One thing you have to learn, is that in Perl, "\(" is the same thing as "(" you're just telling Perl that you want a literal '(' in the string. In order to get lasting escapes, you need to double them.
my $pattern = "javascript:window.open\\('([^']+)'\\);";
my $regex = qr/$pattern/;
Actually preserves the literal ( and yields:
(?-xism:javascript:window.open\('([^']+)'\);)
Which is what I think you want.
As for your question, you should always test the results of a match before using it.
if ( $mech->content =~ m/($pattern)/ ) {
print $1;
}
makes much more sense. And if you want to see it regardless, then it's already implicit in that idea that it might not have a value. i.e., you might not have matched anything. In that case it's best to put alternatives
$mech->content =~ m/($pattern)/;
print $1 || 'UNDEF!';
However, I prefer to grab my captures in the same statement, like so:
my ( $open_arg ) = $mech->content =~ m/($pattern)/;
print $open_arg || 'UNDEF!';
The parens around $open_arg puts the match into a "list context" and returns the captures in a list. Here I'm only expecting one value, so that's all I'm providing for.
Finally, one of the root causes of your problems is that you do not need to specify your expression in a string in order for your regex to be "portable". You can get perl to pre-compile your expression. That way, you only care what instructions the characters are to a regex and not whether or not you'll save your escapes until it is compiled into an expression.
A compiled regex will interpolate itself into other regexes properly. Thus, you get a portable expression that interpolates just as well as a string--and specifically correctly handles instructions that could be lost in a string.
my $pattern = qr/javascript:window.open\('([^']+)'\);/;
Is all that you need. Then you can use it, just as you did. Although, putting parens around the whole thing, would return the whole matched expression (and not just what's between the quotes).
You do not need the parentheses in the match pattern. It will match the whole pattern and return that as $1, which I am guess is not matching, but I am only guessing.
$mech->content =~ m/$pattern/;
or
$mech->content =~ m/(?:$pattern)/;
These are the clustering, non-capturing parentheses.
The way you are doing it is correct.
The solutions have been already given, I'd like to point out that the window.open call might have multiple parameters included in "" and grouped by comma like:
javascript:window.open("http://www.javascript-coder.com","mywindow","status=1,toolbar=1");
There might be spaces between the function name and parentheses, so I'd use a slighty different regex for that:
my $pattern = qr{
javascript:window.open\s*
\(
([^)]+)
\)
}x;
print $1 if $text =~ /$pattern/;
Now you have all parameters in $1 and can process them afterwards with split /,/, $stuff and so on.
It reports an uninitialized value because $1 is undefined. $1 is undefined because you have created a nested matching group by wrapping a second set of parentheses around the pattern. It will also be undefined if nothing matches your pattern.

How can I capture multiple matches from the same Perl regex?

I'm trying to parse a single string and get multiple chunks of data out from the same string with the same regex conditions. I'm parsing a single HTML doc that is static (For an undisclosed reason, I can't use an HTML parser to do the job.) I have an expression that looks like:
$string =~ /\<img\ssrc\="(.*)"/;
and I want to get the value of $1. However, in the one string, there are many img tags like this, so I need something like an array returned (#1?) is this possible?
As Jim's answer, use the /g modifier (in list context or in a loop).
But beware of greediness, you dont want the .* to match more than necessary (and dont escape < = , they are not special).
while($string =~ /<img\s+src="(.*?)"/g ) {
...
}
#list = ($string =~ m/\<img\ssrc\="(.*)"/g);
The g modifier matches all occurences in the string. List context returns all of the matches. See the m// operator in perlop.
You just need the global modifier /g at the end of the match. Then loop through
until there are no matches remaining
my #matches;
while ($string =~ /\<img\ssrc\="(.*)"/g) {
push(#matches, $1);
}
Use the /g modifier and list context on the left, as in
#result = $string =~ /\<img\ssrc\="(.*)"/g;